如何在Java中转义正则表达式的文本

时间:2022-01-08 05:26:00

Does Java have a built-in way to escape arbitrary text so that it can be included in a regular expression? For example, if my users enter "$5", I'd like to match that exactly rather than a "5" after the end of input.

Java是否有一种内置的方法来转义任意的文本,以便将其包含到正则表达式中?例如,如果我的用户输入“$5”,那么我希望在输入结束后输入“5”,而不是“$5”。

7 个解决方案

#1


402  

Since Java 1.5, yes:

因为Java 1.5,是的:

Pattern.quote("$5");

#2


94  

Difference between Pattern.quote and Matcher.quoteReplacement was not clear to me before I saw following example

模式之间的区别。报价和匹配器。在我看到下面的例子之前,我并不清楚引用的位置

s.replaceFirst(Pattern.quote("text to replace"), 
               Matcher.quoteReplacement("replacement text"));

#3


23  

It may be too late to respond, but you can also use Pattern.LITERAL, which would ignore all special characters while formatting:

响应可能为时已晚,但您也可以使用模式。文字,在格式化时忽略所有特殊字符:

Pattern.compile(textToFormat, Pattern.LITERAL);

#4


13  

I think what you're after is \Q$5\E. Also see Pattern.quote(s) introduced in Java5.

我想你想要的是Q$5\E。还可以看到在Java5中引入的patternquote。

See Pattern javadoc for details.

有关细节,请参见模式javadoc。

#5


10  

First off, if

首先,如果

  • you use replaceAll()
  • 你使用replaceAll()
  • you DON'T use Matcher.quoteReplacement()
  • 你不使用Matcher.quoteReplacement()
  • the text to be substituted in includes a $1
  • 要替换的文本包括1美元

it won't put a 1 at the end. It will look at the search regex for the first matching group and sub THAT in. That's what $1, $2 or $3 means in the replacement text: matching groups from the search pattern.

最后不会写1。它将查看搜索regex,查找第一个匹配组,并将其订阅。这就是替换文本中$1、$2或$3的含义:从搜索模式中匹配组。

I frequently plug long strings of text into .properties files, then generate email subjects and bodies from those. Indeed, this appears to be the default way to do i18n in Spring Framework. I put XML tags, as placeholders, into the strings and I use replaceAll() to replace the XML tags with the values at runtime.

我经常将长串的文本插入.properties文件中,然后从这些文件中生成电子邮件主题和正文。实际上,这似乎是在Spring框架中执行i18n的默认方法。我将XML标记作为占位符放入字符串中,并使用replaceAll()在运行时用值替换XML标记。

I ran into an issue where a user input a dollars-and-cents figure, with a dollar sign. replaceAll() choked on it, with the following showing up in a stracktrace:

我遇到了一个问题,用户输入了一个美元和美分的数字,一个美元符号。replaceAll()被噎着,stracktrace中出现了如下内容:

java.lang.IndexOutOfBoundsException: No group 3
at java.util.regex.Matcher.start(Matcher.java:374)
at java.util.regex.Matcher.appendReplacement(Matcher.java:748)
at java.util.regex.Matcher.replaceAll(Matcher.java:823)
at java.lang.String.replaceAll(String.java:2201)

In this case, the user had entered "$3" somewhere in their input and replaceAll() went looking in the search regex for the third matching group, didn't find one, and puked.

在本例中,用户在输入中输入了“$3”,replaceAll()在搜索regex中查找第三个匹配组,但没有找到,然后呕吐。

Given:

考虑到:

// "msg" is a string from a .properties file, containing "<userInput />" among other tags
// "userInput" is a String containing the user's input

replacing

替换

msg = msg.replaceAll("<userInput \\/>", userInput);

with

msg = msg.replaceAll("<userInput \\/>", Matcher.quoteReplacement(userInput));

solved the problem. The user could put in any kind of characters, including dollar signs, without issue. It behaved exactly the way you would expect.

解决了这个问题。用户可以输入任何类型的字符,包括美元符号,没有问题。它的行为完全符合你的预期。

#6


4  

To have protected pattern you may replace all symbols with "\\\\", except digits and letters. And after that you can put in that protected pattern your special symbols to make this pattern working not like stupid quoted text, but really like a patten, but your own. Without user special symbols.

为了保护图案,你可以将所有的符号替换成“\\\”,除了数字和字母。然后你可以把你的特殊符号放在那个受保护的图案中使这个图案不像愚蠢的引用文字,而是像一个图案,而是你自己的。无需用户特殊符号。

public class Test {
    public static void main(String[] args) {
        String str = "y z (111)";
        String p1 = "x x (111)";
        String p2 = ".* .* \\(111\\)";

        p1 = escapeRE(p1);

        p1 = p1.replace("x", ".*");

        System.out.println( p1 + "-->" + str.matches(p1) ); 
            //.*\ .*\ \(111\)-->true
        System.out.println( p2 + "-->" + str.matches(p2) ); 
            //.* .* \(111\)-->true
    }

    public static String escapeRE(String str) {
        //Pattern escaper = Pattern.compile("([^a-zA-z0-9])");
        //return escaper.matcher(str).replaceAll("\\\\$1");
        return str.replaceAll("([^a-zA-Z0-9])", "\\\\$1");
    }
}

#7


1  

Pattern.quote("blabla") works nicely.

Pattern.quote(“鼓励性产业”)很好地工作。

The Pattern.quote() works nicely. It encloses the sentence with the characters "\Q" and "\E", and if it does escape "\Q" and "\E". However, if you need to do a real regular expression escaping(or custom escaping), you can use this code:

Pattern.quote()很好地工作。它在句子中包含“\Q”和“\E”,如果是转义“\Q”和“\E”。但是,如果您需要执行一个真正的正则表达式转义(或自定义转义),您可以使用以下代码:

String someText = "Some/s/wText*/,**";
System.out.println(someText.replaceAll("[-\\[\\]{}()*+?.,\\\\\\\\^$|#\\\\s]", "\\\\$0"));

This method returns: Some/\s/wText*/\,**

该方法返回:一些/ \ s / wText * / \,* *

Code for example and tests:

代码示例和测试:

String someText = "Some\\E/s/wText*/,**";
System.out.println("Pattern.quote: "+ Pattern.quote(someText));
System.out.println("Full escape: "+someText.replaceAll("[-\\[\\]{}()*+?.,\\\\\\\\^$|#\\\\s]", "\\\\$0"));

#1


402  

Since Java 1.5, yes:

因为Java 1.5,是的:

Pattern.quote("$5");

#2


94  

Difference between Pattern.quote and Matcher.quoteReplacement was not clear to me before I saw following example

模式之间的区别。报价和匹配器。在我看到下面的例子之前,我并不清楚引用的位置

s.replaceFirst(Pattern.quote("text to replace"), 
               Matcher.quoteReplacement("replacement text"));

#3


23  

It may be too late to respond, but you can also use Pattern.LITERAL, which would ignore all special characters while formatting:

响应可能为时已晚,但您也可以使用模式。文字,在格式化时忽略所有特殊字符:

Pattern.compile(textToFormat, Pattern.LITERAL);

#4


13  

I think what you're after is \Q$5\E. Also see Pattern.quote(s) introduced in Java5.

我想你想要的是Q$5\E。还可以看到在Java5中引入的patternquote。

See Pattern javadoc for details.

有关细节,请参见模式javadoc。

#5


10  

First off, if

首先,如果

  • you use replaceAll()
  • 你使用replaceAll()
  • you DON'T use Matcher.quoteReplacement()
  • 你不使用Matcher.quoteReplacement()
  • the text to be substituted in includes a $1
  • 要替换的文本包括1美元

it won't put a 1 at the end. It will look at the search regex for the first matching group and sub THAT in. That's what $1, $2 or $3 means in the replacement text: matching groups from the search pattern.

最后不会写1。它将查看搜索regex,查找第一个匹配组,并将其订阅。这就是替换文本中$1、$2或$3的含义:从搜索模式中匹配组。

I frequently plug long strings of text into .properties files, then generate email subjects and bodies from those. Indeed, this appears to be the default way to do i18n in Spring Framework. I put XML tags, as placeholders, into the strings and I use replaceAll() to replace the XML tags with the values at runtime.

我经常将长串的文本插入.properties文件中,然后从这些文件中生成电子邮件主题和正文。实际上,这似乎是在Spring框架中执行i18n的默认方法。我将XML标记作为占位符放入字符串中,并使用replaceAll()在运行时用值替换XML标记。

I ran into an issue where a user input a dollars-and-cents figure, with a dollar sign. replaceAll() choked on it, with the following showing up in a stracktrace:

我遇到了一个问题,用户输入了一个美元和美分的数字,一个美元符号。replaceAll()被噎着,stracktrace中出现了如下内容:

java.lang.IndexOutOfBoundsException: No group 3
at java.util.regex.Matcher.start(Matcher.java:374)
at java.util.regex.Matcher.appendReplacement(Matcher.java:748)
at java.util.regex.Matcher.replaceAll(Matcher.java:823)
at java.lang.String.replaceAll(String.java:2201)

In this case, the user had entered "$3" somewhere in their input and replaceAll() went looking in the search regex for the third matching group, didn't find one, and puked.

在本例中,用户在输入中输入了“$3”,replaceAll()在搜索regex中查找第三个匹配组,但没有找到,然后呕吐。

Given:

考虑到:

// "msg" is a string from a .properties file, containing "<userInput />" among other tags
// "userInput" is a String containing the user's input

replacing

替换

msg = msg.replaceAll("<userInput \\/>", userInput);

with

msg = msg.replaceAll("<userInput \\/>", Matcher.quoteReplacement(userInput));

solved the problem. The user could put in any kind of characters, including dollar signs, without issue. It behaved exactly the way you would expect.

解决了这个问题。用户可以输入任何类型的字符,包括美元符号,没有问题。它的行为完全符合你的预期。

#6


4  

To have protected pattern you may replace all symbols with "\\\\", except digits and letters. And after that you can put in that protected pattern your special symbols to make this pattern working not like stupid quoted text, but really like a patten, but your own. Without user special symbols.

为了保护图案,你可以将所有的符号替换成“\\\”,除了数字和字母。然后你可以把你的特殊符号放在那个受保护的图案中使这个图案不像愚蠢的引用文字,而是像一个图案,而是你自己的。无需用户特殊符号。

public class Test {
    public static void main(String[] args) {
        String str = "y z (111)";
        String p1 = "x x (111)";
        String p2 = ".* .* \\(111\\)";

        p1 = escapeRE(p1);

        p1 = p1.replace("x", ".*");

        System.out.println( p1 + "-->" + str.matches(p1) ); 
            //.*\ .*\ \(111\)-->true
        System.out.println( p2 + "-->" + str.matches(p2) ); 
            //.* .* \(111\)-->true
    }

    public static String escapeRE(String str) {
        //Pattern escaper = Pattern.compile("([^a-zA-z0-9])");
        //return escaper.matcher(str).replaceAll("\\\\$1");
        return str.replaceAll("([^a-zA-Z0-9])", "\\\\$1");
    }
}

#7


1  

Pattern.quote("blabla") works nicely.

Pattern.quote(“鼓励性产业”)很好地工作。

The Pattern.quote() works nicely. It encloses the sentence with the characters "\Q" and "\E", and if it does escape "\Q" and "\E". However, if you need to do a real regular expression escaping(or custom escaping), you can use this code:

Pattern.quote()很好地工作。它在句子中包含“\Q”和“\E”,如果是转义“\Q”和“\E”。但是,如果您需要执行一个真正的正则表达式转义(或自定义转义),您可以使用以下代码:

String someText = "Some/s/wText*/,**";
System.out.println(someText.replaceAll("[-\\[\\]{}()*+?.,\\\\\\\\^$|#\\\\s]", "\\\\$0"));

This method returns: Some/\s/wText*/\,**

该方法返回:一些/ \ s / wText * / \,* *

Code for example and tests:

代码示例和测试:

String someText = "Some\\E/s/wText*/,**";
System.out.println("Pattern.quote: "+ Pattern.quote(someText));
System.out.println("Full escape: "+someText.replaceAll("[-\\[\\]{}()*+?.,\\\\\\\\^$|#\\\\s]", "\\\\$0"));