java.util。模式的重要性。编译()?

时间:2023-01-07 14:06:17

What is the importance of Pattern.compile() method?
Why do I need to compile the regex string before getting the Matcher object?

Pattern.compile()方法的重要性是什么?为什么在获得Matcher对象之前需要编译regex字符串?

For example :

例如:

String regex = "((\\S+)\\s*some\\s*";

Pattern pattern = Pattern.compile(regex); // why do I need to compile
Matcher matcher = pattern.matcher(text);

6 个解决方案

#1


119  

The compile() method is always called at some point; it's the only way to create a Pattern object. So the question is really, why should you call it explicitly? One reason is that you need a reference to the Matcher object so you can use its methods, like group(int) to retrieve the contents of capturing groups. The only way to get ahold of the Matcher object is through the Pattern object's matcher() method, and the only way to get ahold of the Pattern object is through the compile() method. Then there's the find() method which, unlike matches(), is not duplicated in the String or Pattern classes.

compile()方法总是在某个时刻被调用;这是创建模式对象的惟一方法。所以问题是,为什么要明确地叫它呢?一个原因是您需要对Matcher对象的引用,以便您可以使用它的方法,如group(int)来检索捕获组的内容。获得Matcher对象的唯一方法是通过Pattern对象的Matcher()方法,而获取模式对象的唯一方法是通过compile()方法。然后还有find()方法,与matches()不同,它在字符串或模式类中没有重复。

The other reason is to avoid creating the same Pattern object over and over. Every time you use one of the regex-powered methods in String (or the static matches() method in Pattern), it creates a new Pattern and a new Matcher. So this code snippet:

另一个原因是避免反复创建相同的模式对象。每次您在String中使用regex提供的方法(或者在Pattern中使用静态matches()方法)时,它都会创建一个新的模式和一个新的匹配器。这个代码片段:

for (String s : myStringList) {
    if ( s.matches("\\d+") ) {
        doSomething();
    }
}

...is exactly equivalent to this:

…完全等价于:

for (String s : myStringList) {
    if ( Pattern.compile("\\d+").matcher(s).matches() ) {
        doSomething();
    }
}

Obviously, that's doing a lot of unnecessary work. In fact, it can easily take longer to compile the regex and instantiate the Pattern object, than it does to perform an actual match. So it usually makes sense to pull that step out of the loop. You can create the Matcher ahead of time as well, though they're not nearly so expensive:

显然,这做了很多不必要的工作。实际上,编译regex并实例化模式对象要比执行实际的匹配花费更长的时间。所以把这个步骤从循环中拉出来通常是有意义的。你也可以提前创建Matcher,尽管它们并不是那么贵:

Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher("");
for (String s : myStringList) {
    if ( m.reset(s).matches() ) {
        doSomething();
    }
}

If you're familiar with .NET regexes, you may be wondering if Java's compile() method is related to .NET's RegexOptions.Compiled modifier; the answer is no. Java's Pattern.compile() method is merely equivalent to .NET's Regex constructor. When you specify the Compiled option:

如果您熟悉. net regexes,您可能想知道Java的compile()方法是否与. net的RegexOptions有关。编译修改器;答案是否定的。Java的Pattern.compile()方法仅仅等同于。net的Regex构造函数。当您指定编译选项时:

Regex r = new Regex(@"\d+", RegexOptions.Compiled); 

...it compiles the regex directly to CIL byte code, allowing it to perform much faster, but at a significant cost in up-front processing and memory use--think of it as steroids for regexes. Java has no equivalent; there's no difference between a Pattern that's created behind the scenes by String#matches(String) and one you create explicitly with Pattern#compile(String).

…它直接将regex编译为CIL代码,使其执行速度更快,但在预先处理和内存使用方面要付出巨大代价——可以将其视为regexes的类固醇。Java没有等价;在字符串#matches(String)创建的模式和使用模式#compile(String)显式创建的模式之间没有区别。

(EDIT: I originally said that all .NET Regex objects are cached, which is incorrect. Since .NET 2.0, automatic caching occurs only with static methods like Regex.Matches(), not when you call a Regex constructor directly. ref)

(编辑:我之前说过所有的。net Regex对象都被缓存,这是不正确的。由于。net 2.0,自动高速缓存只发生在静态方法(如Regex. matches()),而不是直接调用Regex构造函数时。ref)

#2


31  

Compile parses the regular expression and builds an in-memory representation. The overhead to compile is significant compared to a match. If you're using a pattern repeatedly it will gain some performance to cache the compiled pattern.

编译解析正则表达式并构建内存中的表示。与匹配相比,编译的开销非常大。如果您正在重复地使用一个模式,它将获得一些性能来缓存已编译的模式。

#3


13  

When you compile the Pattern Java does some computation to make finding matches in Strings faster. (Builds an in-memory representation of the regex)

在编译模式时,Java会进行一些计算,以便更快地在字符串中查找匹配项。(构建regex的内存中表示)

If you are going to reuse the Pattern multiple times you would see a vast performance increase over creating a new Pattern every time.

如果您打算多次重用该模式,那么每次都将看到一个巨大的性能增长,而不是创建一个新的模式。

In the case of only using the Pattern once, the compiling step just seems like an extra line of code, but, in fact, it can be very helpful in the general case.

在只使用模式一次的情况下,编译步骤看起来就像多了一行代码,但实际上,它在一般情况下非常有用。

#4


3  

It is matter of performance and memory usage, compile and keep the complied pattern if you need to use it a lot. A typical usage of regex is to validated user input (format), and also format output data for users, in these classes, saving the complied pattern, seems quite logical as they usually called a lot.

性能和内存使用问题,编译并保持遵守模式如果你需要使用它。一个典型的使用正则表达式来验证用户输入(格式),以及格式输出数据对于用户来说,在这些类,保存执行模式,似乎像他们通常称为逻辑。

Below is a sample validator, which is really called a lot :)

下面是一个示例验证器,它实际上被称为“很多”:

public class AmountValidator {
    //Accept 123 - 123,456 - 123,345.34
    private static final String AMOUNT_REGEX="\\d{1,3}(,\\d{3})*(\\.\\d{1,4})?|\\.\\d{1,4}";
    //Compile and save the pattern  
    private static final Pattern AMOUNT_PATTERN = Pattern.compile(AMOUNT_REGEX);


    public boolean validate(String amount){

         if (!AMOUNT_PATTERN.matcher(amount).matches()) {
            return false;
         }    
        return true;
    }    
}

As mentioned by @Alan Moore, if you have reusable regex in your code, (before a loop for example), you must compile and save pattern for reuse.

正如@Alan Moore提到的,如果您的代码中有可重用的regex(例如在循环之前),您必须编译并保存模式以供重用。

#5


0  

Pre-compiling the regex increases the speed. Re-using the Matcher gives you another slight speedup. If the method gets called frequently say gets called within a loop, the overall performace will certainly go up.

预编译regex会提高速度。重新使用匹配程序会给您带来另一个轻微的加速。如果该方法被频繁调用,比如在循环中被调用,那么总体性能肯定会提高。

#6


0  

Similar to 'Pattern.compile' there is 'RECompiler.compile' [from com.sun.org.apache.regexp.internal] where:
1. compiled code for pattern [a-z] has 'az' in it
2. compiled code for pattern [0-9] has '09' in it
3. compiled code for pattern [abc] has 'aabbcc' in it.

类似于“Pattern.compile .compile”(从com.sun.org.apache.regexp.internal): 1。模式[a-z]的编译代码在其中有“az”。模式[0-9]的编译代码中有“09”。已编译的模式代码[abc]在其中有“aabbcc”。

Thus compiled code is a great way to generalize multiple cases. Thus instead of having different code handling situation 1,2 and 3 . The problem reduces to comparing with the ascii of present and next element in the compiled code, hence the pairs. Thus
a. anything with ascii between a and z is between a and z
b. anything with ascii between 'a and a is definitely 'a'

因此,编译代码是一种很好的归纳多种情况的方法。因此,不必有不同的代码处理情况1、2和3。这个问题简化为与已编译代码中的当前和下一个元素的ascii码进行比较,即对。因此,a和z之间的ascii都是a和z b之间的任何东西a和a之间的ascii都是a

#1


119  

The compile() method is always called at some point; it's the only way to create a Pattern object. So the question is really, why should you call it explicitly? One reason is that you need a reference to the Matcher object so you can use its methods, like group(int) to retrieve the contents of capturing groups. The only way to get ahold of the Matcher object is through the Pattern object's matcher() method, and the only way to get ahold of the Pattern object is through the compile() method. Then there's the find() method which, unlike matches(), is not duplicated in the String or Pattern classes.

compile()方法总是在某个时刻被调用;这是创建模式对象的惟一方法。所以问题是,为什么要明确地叫它呢?一个原因是您需要对Matcher对象的引用,以便您可以使用它的方法,如group(int)来检索捕获组的内容。获得Matcher对象的唯一方法是通过Pattern对象的Matcher()方法,而获取模式对象的唯一方法是通过compile()方法。然后还有find()方法,与matches()不同,它在字符串或模式类中没有重复。

The other reason is to avoid creating the same Pattern object over and over. Every time you use one of the regex-powered methods in String (or the static matches() method in Pattern), it creates a new Pattern and a new Matcher. So this code snippet:

另一个原因是避免反复创建相同的模式对象。每次您在String中使用regex提供的方法(或者在Pattern中使用静态matches()方法)时,它都会创建一个新的模式和一个新的匹配器。这个代码片段:

for (String s : myStringList) {
    if ( s.matches("\\d+") ) {
        doSomething();
    }
}

...is exactly equivalent to this:

…完全等价于:

for (String s : myStringList) {
    if ( Pattern.compile("\\d+").matcher(s).matches() ) {
        doSomething();
    }
}

Obviously, that's doing a lot of unnecessary work. In fact, it can easily take longer to compile the regex and instantiate the Pattern object, than it does to perform an actual match. So it usually makes sense to pull that step out of the loop. You can create the Matcher ahead of time as well, though they're not nearly so expensive:

显然,这做了很多不必要的工作。实际上,编译regex并实例化模式对象要比执行实际的匹配花费更长的时间。所以把这个步骤从循环中拉出来通常是有意义的。你也可以提前创建Matcher,尽管它们并不是那么贵:

Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher("");
for (String s : myStringList) {
    if ( m.reset(s).matches() ) {
        doSomething();
    }
}

If you're familiar with .NET regexes, you may be wondering if Java's compile() method is related to .NET's RegexOptions.Compiled modifier; the answer is no. Java's Pattern.compile() method is merely equivalent to .NET's Regex constructor. When you specify the Compiled option:

如果您熟悉. net regexes,您可能想知道Java的compile()方法是否与. net的RegexOptions有关。编译修改器;答案是否定的。Java的Pattern.compile()方法仅仅等同于。net的Regex构造函数。当您指定编译选项时:

Regex r = new Regex(@"\d+", RegexOptions.Compiled); 

...it compiles the regex directly to CIL byte code, allowing it to perform much faster, but at a significant cost in up-front processing and memory use--think of it as steroids for regexes. Java has no equivalent; there's no difference between a Pattern that's created behind the scenes by String#matches(String) and one you create explicitly with Pattern#compile(String).

…它直接将regex编译为CIL代码,使其执行速度更快,但在预先处理和内存使用方面要付出巨大代价——可以将其视为regexes的类固醇。Java没有等价;在字符串#matches(String)创建的模式和使用模式#compile(String)显式创建的模式之间没有区别。

(EDIT: I originally said that all .NET Regex objects are cached, which is incorrect. Since .NET 2.0, automatic caching occurs only with static methods like Regex.Matches(), not when you call a Regex constructor directly. ref)

(编辑:我之前说过所有的。net Regex对象都被缓存,这是不正确的。由于。net 2.0,自动高速缓存只发生在静态方法(如Regex. matches()),而不是直接调用Regex构造函数时。ref)

#2


31  

Compile parses the regular expression and builds an in-memory representation. The overhead to compile is significant compared to a match. If you're using a pattern repeatedly it will gain some performance to cache the compiled pattern.

编译解析正则表达式并构建内存中的表示。与匹配相比,编译的开销非常大。如果您正在重复地使用一个模式,它将获得一些性能来缓存已编译的模式。

#3


13  

When you compile the Pattern Java does some computation to make finding matches in Strings faster. (Builds an in-memory representation of the regex)

在编译模式时,Java会进行一些计算,以便更快地在字符串中查找匹配项。(构建regex的内存中表示)

If you are going to reuse the Pattern multiple times you would see a vast performance increase over creating a new Pattern every time.

如果您打算多次重用该模式,那么每次都将看到一个巨大的性能增长,而不是创建一个新的模式。

In the case of only using the Pattern once, the compiling step just seems like an extra line of code, but, in fact, it can be very helpful in the general case.

在只使用模式一次的情况下,编译步骤看起来就像多了一行代码,但实际上,它在一般情况下非常有用。

#4


3  

It is matter of performance and memory usage, compile and keep the complied pattern if you need to use it a lot. A typical usage of regex is to validated user input (format), and also format output data for users, in these classes, saving the complied pattern, seems quite logical as they usually called a lot.

性能和内存使用问题,编译并保持遵守模式如果你需要使用它。一个典型的使用正则表达式来验证用户输入(格式),以及格式输出数据对于用户来说,在这些类,保存执行模式,似乎像他们通常称为逻辑。

Below is a sample validator, which is really called a lot :)

下面是一个示例验证器,它实际上被称为“很多”:

public class AmountValidator {
    //Accept 123 - 123,456 - 123,345.34
    private static final String AMOUNT_REGEX="\\d{1,3}(,\\d{3})*(\\.\\d{1,4})?|\\.\\d{1,4}";
    //Compile and save the pattern  
    private static final Pattern AMOUNT_PATTERN = Pattern.compile(AMOUNT_REGEX);


    public boolean validate(String amount){

         if (!AMOUNT_PATTERN.matcher(amount).matches()) {
            return false;
         }    
        return true;
    }    
}

As mentioned by @Alan Moore, if you have reusable regex in your code, (before a loop for example), you must compile and save pattern for reuse.

正如@Alan Moore提到的,如果您的代码中有可重用的regex(例如在循环之前),您必须编译并保存模式以供重用。

#5


0  

Pre-compiling the regex increases the speed. Re-using the Matcher gives you another slight speedup. If the method gets called frequently say gets called within a loop, the overall performace will certainly go up.

预编译regex会提高速度。重新使用匹配程序会给您带来另一个轻微的加速。如果该方法被频繁调用,比如在循环中被调用,那么总体性能肯定会提高。

#6


0  

Similar to 'Pattern.compile' there is 'RECompiler.compile' [from com.sun.org.apache.regexp.internal] where:
1. compiled code for pattern [a-z] has 'az' in it
2. compiled code for pattern [0-9] has '09' in it
3. compiled code for pattern [abc] has 'aabbcc' in it.

类似于“Pattern.compile .compile”(从com.sun.org.apache.regexp.internal): 1。模式[a-z]的编译代码在其中有“az”。模式[0-9]的编译代码中有“09”。已编译的模式代码[abc]在其中有“aabbcc”。

Thus compiled code is a great way to generalize multiple cases. Thus instead of having different code handling situation 1,2 and 3 . The problem reduces to comparing with the ascii of present and next element in the compiled code, hence the pairs. Thus
a. anything with ascii between a and z is between a and z
b. anything with ascii between 'a and a is definitely 'a'

因此,编译代码是一种很好的归纳多种情况的方法。因此,不必有不同的代码处理情况1、2和3。这个问题简化为与已编译代码中的当前和下一个元素的ascii码进行比较,即对。因此,a和z之间的ascii都是a和z b之间的任何东西a和a之间的ascii都是a