是否有正则表达式方法用另一组替换一组字符(如shell tr​​命令)?

时间:2022-09-28 23:28:37

The shell tr command support replace one set of characters with another set. For example, echo hello | tr [a-z] [A-Z] will tranlate hello to HELLO.

shell tr​​命令支持用另一组替换一组字符。例如,echo hello | tr [a-z] [A-Z]将你好转发给HELLO。

In java, however, I must replace each character individually like the following

但是,在java中,我必须单独替换每个字符,如下所示

"10 Dogs Are Racing"    .replaceAll ("0", "0")    .replaceAll ("1", "1")    .replaceAll ("2", "2")    // ...    .replaceAll ("9", "9")    .replaceAll ("A", "A")    // ...;

The apache-commons-lang library provides a convenient replaceChars method to do such replacement.

apache-commons-lang库提供了一个方便的replaceChars方法来进行这种替换。

// half-width to full-widthSystem.out.println(    org.apache.commons.lang.StringUtils.replaceChars    (        "10 Dogs Are Racing",        "0123456789ABCDEFEGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz",        "0123456789ABCDEFEGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"    ));// Result:// 10 Dogs Are Racing

But as you can see, sometime the searchChars/replaceChars are too long (also too boring, please find a duplicated character in it if you want), and can be expressed by a simple regular expression [0-9A-Za-z]/[0-9A-Za-z]. Is there a regular expression way to achieve that ?

但正如你所看到的,有时searchChars / replaceChars太长(也太无聊,如果你想要的话,请在其中找到一个重复的字符),并且可以用一个简单的正则表达式表示[0-9A-Za-z] / [0-9A-ZA-Z]。是否有正则表达方式来实现这一目标?

2 个解决方案

#1


5  

While there is no direct way to do this, constructing your own utility function to use in combination with replaceChars is relatively simple. The version below accepts simple character classes, without [ or ]; it does not do class negation ([^a-z]).

虽然没有直接的方法可以做到这一点,但构建自己的实用程序函数以与replaceChars结合使用相对简单。下面的版本接受简单的字符类,没有[或];它不做类否定([^ a-z])。

For your use case, you could do:

对于您的用例,您可以:

StringUtils.replaceChars(str, charRange("0-9A-Za-z"), charRange("0-9A-Za-z"))

Code:

public static String charRange(String str) {    StringBuilder ret = new StringBuilder();    char ch;    for(int index = 0; index < str.length(); index++) {        ch = str.charAt(index);        if(ch == '\\') {            if(index + 1 >= str.length()) {                throw new PatternSyntaxException(                    "Malformed escape sequence.", str, index                );            }            // special case for escape character, consume next char:            index++;            ch = str.charAt(index);        }        if(index + 1 >= str.length() || str.charAt(index + 1) != '-') {            // this was a single char, or the last char in the string            ret.append(ch);        } else {            if(index + 2 >= str.length()) {                throw new PatternSyntaxException(                    "Malformed character range.", str, index + 1                );            }            // this char was the beginning of a range            for(char r = ch; r <= str.charAt(index + 2); r++) {                ret.append(r);            }            index = index + 2;        }    }    return ret.toString();}

Produces:

0-9A-Za-z : 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0-9A-Za-z : 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz

#2


5  

No.

(some extra characters so that SO will allow me to post my otherwise succinct answer)

(一些额外的字符,以便SO允许我发布我的简洁答案)

#1


5  

While there is no direct way to do this, constructing your own utility function to use in combination with replaceChars is relatively simple. The version below accepts simple character classes, without [ or ]; it does not do class negation ([^a-z]).

虽然没有直接的方法可以做到这一点,但构建自己的实用程序函数以与replaceChars结合使用相对简单。下面的版本接受简单的字符类,没有[或];它不做类否定([^ a-z])。

For your use case, you could do:

对于您的用例,您可以:

StringUtils.replaceChars(str, charRange("0-9A-Za-z"), charRange("0-9A-Za-z"))

Code:

public static String charRange(String str) {    StringBuilder ret = new StringBuilder();    char ch;    for(int index = 0; index < str.length(); index++) {        ch = str.charAt(index);        if(ch == '\\') {            if(index + 1 >= str.length()) {                throw new PatternSyntaxException(                    "Malformed escape sequence.", str, index                );            }            // special case for escape character, consume next char:            index++;            ch = str.charAt(index);        }        if(index + 1 >= str.length() || str.charAt(index + 1) != '-') {            // this was a single char, or the last char in the string            ret.append(ch);        } else {            if(index + 2 >= str.length()) {                throw new PatternSyntaxException(                    "Malformed character range.", str, index + 1                );            }            // this char was the beginning of a range            for(char r = ch; r <= str.charAt(index + 2); r++) {                ret.append(r);            }            index = index + 2;        }    }    return ret.toString();}

Produces:

0-9A-Za-z : 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0-9A-Za-z : 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz

#2


5  

No.

(some extra characters so that SO will allow me to post my otherwise succinct answer)

(一些额外的字符,以便SO允许我发布我的简洁答案)