Consider the following strings:
请考虑以下字符串:
1) Scheme ID: abc-456-hu5t10 (High priority) *****
1)方案ID:abc-456-hu5t10(高优先级)*****
2) Scheme ID: frt-78f-hj542w (Balanced)
2)方案编号:frt-78f-hj542w(平衡)
3) Scheme ID: 23f-f974-nm54w (super formula run) *****
3)方案ID:23f-f974-nm54w(超级配方运行)*****
and so on in the above format - the parts in bold are changes across the strings.
以上述格式等等 - 粗体部分是字符串的变化。
==> Imagine I've many strings of format Shown above. I want to pick 3 substrings (As shown in BOLD below) from the each of the above strings.
==>想象一下,我上面有许多格式的字符串。我想从上面的每个字符串中选择3个子串(如下面的BOLD所示)。
- 1st substring containing the alphanumeric value (in eg above it's "abc-456-hu5t10")
- 包含字母数字值的第一个子字符串(例如在其上面的“abc-456-hu5t10”)
- 2nd substring containing the word (in eg above it's "High priority")
- 包含该单词的第二个子字符串(例如在其上方的“高优先级”)
- 3rd substring containing * (
IF
* is present at the end of the stringELSE
leave it ) - 包含*的第3个子字符串(IF *出现在字符串ELSE的末尾)
How do I pick these 3 substrings from each string shown above? I know it can be done using regular expressions in Perl... Can you help with this?
如何从上面显示的每个字符串中选择这3个子字符串?我知道可以使用Perl中的正则表达式来完成...你能帮忙解决这个问题吗?
7 个解决方案
#1
29
You could do something like this:
你可以这样做:
my $data = <<END;
1) Scheme ID: abc-456-hu5t10 (High priority) *
2) Scheme ID: frt-78f-hj542w (Balanced)
3) Scheme ID: 23f-f974-nm54w (super formula run) *
END
foreach (split(/\n/,$data)) {
$_ =~ /Scheme ID: ([a-z0-9-]+)\s+\(([^)]+)\)\s*(\*)?/ || next;
my ($id,$word,$star) = ($1,$2,$3);
print "$id $word $star\n";
}
The key thing is the Regular expression:
关键是正则表达式:
Scheme ID: ([a-z0-9-]+)\s+\(([^)]+)\)\s*(\*)?
Which breaks up as follows.
其中分解如下。
The fixed String "Scheme ID: ":
固定字符串“Scheme ID:”:
Scheme ID:
Followed by one or more of the characters a-z, 0-9 or -. We use the brackets to capture it as $1:
随后是一个或多个字符a-z,0-9或 - 。我们使用括号将其捕获为$ 1:
([a-z0-9-]+)
Followed by one or more whitespace characters:
后面跟着一个或多个空格字符:
\s+
Followed by an opening bracket (which we escape) followed by any number of characters which aren't a close bracket, and then a closing bracket (escaped). We use unescaped brackets to capture the words as $2:
接下来是一个左括号(我们将其转义),后跟任意数量的非紧密括号的字符,然后是一个右括号(转义)。我们使用未转义的括号将单词捕获为$ 2:
\(([^)]+)\)
Followed by some spaces any maybe a *, captured as $3:
随后是一些空格,可能是*,被捕获为3美元:
\s*(\*)?
#2
3
You could use a regular expression such as the following:
您可以使用正则表达式,如下所示:
/([-a-z0-9]+)\s*\((.*?)\)\s*(\*)?/
So for example:
例如:
$s = "abc-456-hu5t10 (High priority) *";
$s =~ /([-a-z0-9]+)\s*\((.*?)\)\s*(\*)?/;
print "$1\n$2\n$3\n";
prints
版画
abc-456-hu5t10 High priority *
#3
2
Well, a one liner here:
好吧,这里有一个班轮:
perl -lne 'm|Scheme ID:\s+(.*?)\s+\((.*?)\)\s?(\*)?|g&&print "$1:$2:$3"' file.txt
Expanded to a simple script to explain things a bit better:
扩展到一个简单的脚本来解释事情好一点:
#!/usr/bin/perl -ln
#-w : warnings
#-l : print newline after every print
#-n : apply script body to stdin or files listed at commandline, dont print $_
use strict; #always do this.
my $regex = qr{ # precompile regex
Scheme\ ID: # to match beginning of line.
\s+ # 1 or more whitespace
(.*?) # Non greedy match of all characters up to
\s+ # 1 or more whitespace
\( # parenthesis literal
(.*?) # non-greedy match to the next
\) # closing literal parenthesis
\s* # 0 or more whitespace (trailing * is optional)
(\*)? # 0 or 1 literal *s
}x; #x switch allows whitespace in regex to allow documentation.
#values trapped in $1 $2 $3, so do whatever you need to:
#Perl lets you use any characters as delimiters, i like pipes because
#they reduce the amount of escaping when using file paths
m|$regex| && print "$1 : $2 : $3";
#alternatively if(m|$regex|) {doOne($1); doTwo($2) ... }
Though if it were anything other than formatting, I would implement a main loop to handle files and flesh out the body of the script rather than rely ing on the commandline switches for the looping.
虽然如果它不是格式化,我会实现一个主循环来处理文件并充实脚本的主体,而不是依赖命令行开关进行循环。
#4
2
(\S*)\s*\((.*?)\)\s*(\*?)
(\S*) picks up anything which is NOT whitespace
\s* 0 or more whitespace characters
\( a literal open parenthesis
(.*?) anything, non-greedy so stops on first occurrence of...
\) a literal close parenthesis
\s* 0 or more whitespace characters
(\*?) 0 or 1 occurances of literal *
#5
1
Long time no Perl
很久没有Perl
while(<STDIN>) {
next unless /:\s*(\S+)\s+\(([^\)]+)\)\s*(\*?)/;
print "|$1|$2|$3|\n";
}
#6
1
This just requires a small change to my last answer:
这只需要对我的上一个答案做一点改动:
my ($guid, $scheme, $star) = $line =~ m{
The [ ] Scheme [ ] GUID: [ ]
([a-zA-Z0-9-]+) #capture the guid
[ ]
\( (.+) \) #capture the scheme
(?:
[ ]
([*]) #capture the star
)? #if it exists
}x;
#7
0
String 1:
字符串1:
$input =~ /'^\S+'/;
$s1 = $&;
String 2:
字符串2:
$input =~ /\(.*\)/;
$s2 = $&;
String 3:
字符串3:
$input =~ /\*?$/;
$s3 = $&;
#1
29
You could do something like this:
你可以这样做:
my $data = <<END;
1) Scheme ID: abc-456-hu5t10 (High priority) *
2) Scheme ID: frt-78f-hj542w (Balanced)
3) Scheme ID: 23f-f974-nm54w (super formula run) *
END
foreach (split(/\n/,$data)) {
$_ =~ /Scheme ID: ([a-z0-9-]+)\s+\(([^)]+)\)\s*(\*)?/ || next;
my ($id,$word,$star) = ($1,$2,$3);
print "$id $word $star\n";
}
The key thing is the Regular expression:
关键是正则表达式:
Scheme ID: ([a-z0-9-]+)\s+\(([^)]+)\)\s*(\*)?
Which breaks up as follows.
其中分解如下。
The fixed String "Scheme ID: ":
固定字符串“Scheme ID:”:
Scheme ID:
Followed by one or more of the characters a-z, 0-9 or -. We use the brackets to capture it as $1:
随后是一个或多个字符a-z,0-9或 - 。我们使用括号将其捕获为$ 1:
([a-z0-9-]+)
Followed by one or more whitespace characters:
后面跟着一个或多个空格字符:
\s+
Followed by an opening bracket (which we escape) followed by any number of characters which aren't a close bracket, and then a closing bracket (escaped). We use unescaped brackets to capture the words as $2:
接下来是一个左括号(我们将其转义),后跟任意数量的非紧密括号的字符,然后是一个右括号(转义)。我们使用未转义的括号将单词捕获为$ 2:
\(([^)]+)\)
Followed by some spaces any maybe a *, captured as $3:
随后是一些空格,可能是*,被捕获为3美元:
\s*(\*)?
#2
3
You could use a regular expression such as the following:
您可以使用正则表达式,如下所示:
/([-a-z0-9]+)\s*\((.*?)\)\s*(\*)?/
So for example:
例如:
$s = "abc-456-hu5t10 (High priority) *";
$s =~ /([-a-z0-9]+)\s*\((.*?)\)\s*(\*)?/;
print "$1\n$2\n$3\n";
prints
版画
abc-456-hu5t10 High priority *
#3
2
Well, a one liner here:
好吧,这里有一个班轮:
perl -lne 'm|Scheme ID:\s+(.*?)\s+\((.*?)\)\s?(\*)?|g&&print "$1:$2:$3"' file.txt
Expanded to a simple script to explain things a bit better:
扩展到一个简单的脚本来解释事情好一点:
#!/usr/bin/perl -ln
#-w : warnings
#-l : print newline after every print
#-n : apply script body to stdin or files listed at commandline, dont print $_
use strict; #always do this.
my $regex = qr{ # precompile regex
Scheme\ ID: # to match beginning of line.
\s+ # 1 or more whitespace
(.*?) # Non greedy match of all characters up to
\s+ # 1 or more whitespace
\( # parenthesis literal
(.*?) # non-greedy match to the next
\) # closing literal parenthesis
\s* # 0 or more whitespace (trailing * is optional)
(\*)? # 0 or 1 literal *s
}x; #x switch allows whitespace in regex to allow documentation.
#values trapped in $1 $2 $3, so do whatever you need to:
#Perl lets you use any characters as delimiters, i like pipes because
#they reduce the amount of escaping when using file paths
m|$regex| && print "$1 : $2 : $3";
#alternatively if(m|$regex|) {doOne($1); doTwo($2) ... }
Though if it were anything other than formatting, I would implement a main loop to handle files and flesh out the body of the script rather than rely ing on the commandline switches for the looping.
虽然如果它不是格式化,我会实现一个主循环来处理文件并充实脚本的主体,而不是依赖命令行开关进行循环。
#4
2
(\S*)\s*\((.*?)\)\s*(\*?)
(\S*) picks up anything which is NOT whitespace
\s* 0 or more whitespace characters
\( a literal open parenthesis
(.*?) anything, non-greedy so stops on first occurrence of...
\) a literal close parenthesis
\s* 0 or more whitespace characters
(\*?) 0 or 1 occurances of literal *
#5
1
Long time no Perl
很久没有Perl
while(<STDIN>) {
next unless /:\s*(\S+)\s+\(([^\)]+)\)\s*(\*?)/;
print "|$1|$2|$3|\n";
}
#6
1
This just requires a small change to my last answer:
这只需要对我的上一个答案做一点改动:
my ($guid, $scheme, $star) = $line =~ m{
The [ ] Scheme [ ] GUID: [ ]
([a-zA-Z0-9-]+) #capture the guid
[ ]
\( (.+) \) #capture the scheme
(?:
[ ]
([*]) #capture the star
)? #if it exists
}x;
#7
0
String 1:
字符串1:
$input =~ /'^\S+'/;
$s1 = $&;
String 2:
字符串2:
$input =~ /\(.*\)/;
$s2 = $&;
String 3:
字符串3:
$input =~ /\*?$/;
$s3 = $&;