如何为正则表达式“单词字符”编写CharMatcher等价物?

时间:2021-07-22 15:51:47

The regex \w matches exactly the characters [A-Za-z0-9_], which is exactly what I need now. I wonder if there's any simple way how to create a corresponding Guava's CharMatcher. I did the following (I don't like static imports):

正则表达式\ w完全匹配字符[A-Za-z0-9_],这正是我现在需要的。我想知道是否有任何简单的方法来创建相应的Guava的CharMatcher。我做了以下(我不喜欢静态导入):

private final static CharMatcher IDENTIFIER_CHAR = CharMatcher.is('_')
    .or(CharMatcher.inRange('A', 'Z'))
    .or(CharMatcher.inRange('a', 'z'))
    .or(CharMatcher.inRange('0', '9'))
    .precomputed();

There are a few predefined matchers, however something like

有一些预定义的匹配器,但有点像

private final static CharMatcher IDENTIFIER_CHAR = CharMatcher.ASCII
    .and(CharMatcher.JAVA_LETTER_OR_DIGIT)
    .or(CharMatcher.is('_'))
    .precomputed();

doesn't look any better. Neither does using forPredicate before we get closures in Java 8. There's no real problem here, it's just too verbose for something as simple and (I guess) as common.

看起来不是更好。在我们在Java 8中获得闭包之前,也没有使用forPredicate。这里没有真正的问题,对于简单和(我猜)常见的东西来说,它太冗长了。

Any nicer solution? Maybe did anybody implement something like newRegexLikeCharMatcher("[A-Za-z0-9_]")?

任何更好的解决方案?也许有人实现了像newRegexLikeCharMatcher(“[A-Za-z0-9_]”)这样的东西?

2 个解决方案

#1


3  

An implementation of your suggested method could be:

您建议的方法的实现可以是:

public CharMatcher newRegexLikeCharMatcher(String regex) {
    final Pattern pattern = Pattern.compile(regex);
    return new CharMatcher() {
        @Override
        public boolean matches(char c) {
            return pattern.matcher(Character.toString(c)).find();
        }
    }.precomputed();
}

or

要么

public CharMatcher newRegexLikeCharMatcher(String regex) {
    return CharMatcher.forPredicate(Predicates.compose(Predicates.containsPattern(regex), Functions.toStringFunction()))
            .precomputed();
}

#2


0  

I wrote this trivial method, which gets used a couple of times and makes it all a bit nicer:

我写了这个简单的方法,它被使用了几次并使它更好一些:

private static CharMatcher inRanges(char startInclusive, char endInclusive, char... chars) {
    Preconditions.checkArgument((chars.length & 1) == 0, "The chars must come in pairs");
    CharMatcher result = CharMatcher.inRange(startInclusive, endInclusive);
    for (int i=0; i<chars.length; i+=2) result = result.or(CharMatcher.inRange(chars[i], chars[i+1]));
    return result;
}

I'm afraid such cases as mine are not common enough and every user can make the solution for their special case just like I did...

我担心我的这种情况不够普遍,每个用户都可以像我一样为他们的特殊案例制定解决方案......


I found my above solution still impractical (too much apostrophes to type) and created this trivial method instead

我发现我的上述解决方案仍然不切实际(输入的撇号太多)并改为创建了这个简单的方法

public static CharMatcher newRegexLikeCharMatcher(String s) {
    CharMatcher result = CharMatcher.NONE;
    for (int i=0; i<s.length(); ++i) {
        if (i+2 < s.length() && s.charAt(i+1) == '-') {
            result = result.or(CharMatcher.inRange(s.charAt(i), s.charAt(i+2)));
            i += 2;
        } else {
            result = result.or(CharMatcher.is(s.charAt(i)));
        }
    }
    return result;
}

Whenever it encounters a "char-dash-char" triplet, it interprets it as a range, otherwise it adds a single matching character (so leading and trailing dashes are interpreted literally, even strange things like -a-b- and a-b-c work).

每当遇到“char-dash-char”三元组时,它会将其解释为一个范围,否则它会添加一个匹配的字符(因此前导和尾随破折号按字面解释,甚至像-a-b-和a-b-c这样的奇怪工作)。

#1


3  

An implementation of your suggested method could be:

您建议的方法的实现可以是:

public CharMatcher newRegexLikeCharMatcher(String regex) {
    final Pattern pattern = Pattern.compile(regex);
    return new CharMatcher() {
        @Override
        public boolean matches(char c) {
            return pattern.matcher(Character.toString(c)).find();
        }
    }.precomputed();
}

or

要么

public CharMatcher newRegexLikeCharMatcher(String regex) {
    return CharMatcher.forPredicate(Predicates.compose(Predicates.containsPattern(regex), Functions.toStringFunction()))
            .precomputed();
}

#2


0  

I wrote this trivial method, which gets used a couple of times and makes it all a bit nicer:

我写了这个简单的方法,它被使用了几次并使它更好一些:

private static CharMatcher inRanges(char startInclusive, char endInclusive, char... chars) {
    Preconditions.checkArgument((chars.length & 1) == 0, "The chars must come in pairs");
    CharMatcher result = CharMatcher.inRange(startInclusive, endInclusive);
    for (int i=0; i<chars.length; i+=2) result = result.or(CharMatcher.inRange(chars[i], chars[i+1]));
    return result;
}

I'm afraid such cases as mine are not common enough and every user can make the solution for their special case just like I did...

我担心我的这种情况不够普遍,每个用户都可以像我一样为他们的特殊案例制定解决方案......


I found my above solution still impractical (too much apostrophes to type) and created this trivial method instead

我发现我的上述解决方案仍然不切实际(输入的撇号太多)并改为创建了这个简单的方法

public static CharMatcher newRegexLikeCharMatcher(String s) {
    CharMatcher result = CharMatcher.NONE;
    for (int i=0; i<s.length(); ++i) {
        if (i+2 < s.length() && s.charAt(i+1) == '-') {
            result = result.or(CharMatcher.inRange(s.charAt(i), s.charAt(i+2)));
            i += 2;
        } else {
            result = result.or(CharMatcher.is(s.charAt(i)));
        }
    }
    return result;
}

Whenever it encounters a "char-dash-char" triplet, it interprets it as a range, otherwise it adds a single matching character (so leading and trailing dashes are interpreted literally, even strange things like -a-b- and a-b-c work).

每当遇到“char-dash-char”三元组时,它会将其解释为一个范围,否则它会添加一个匹配的字符(因此前导和尾随破折号按字面解释,甚至像-a-b-和a-b-c这样的奇怪工作)。