除非在单个带引号的字符串中,否则如何用空格分割字符串?

时间:2021-06-16 21:46:18

I'm seeking a solution to splitting a string which contains text in the following format:

我正在寻找一种解决方案来拆分包含以下格式的文本的字符串:

"abcd efgh 'ijklm no pqrs' tuv"

which will produce the following results:

这将产生以下结果:

['abcd', 'efgh', 'ijklm no pqrs', 'tuv']

In other words, it splits by whitespace unless inside of a single quoted string. I think it could be done with .NET regexps using "Lookaround" operators, particularly balancing operators. I'm not so sure about Perl.

换句话说,除非在单个带引号的字符串内,否则它将按空格分割。我认为可以使用.NET regexps使用“Lookaround”运算符来完成,特别是平衡运算符。我对Perl不太确定。

3 个解决方案

#1


15  

Use Text::ParseWords:

使用Text :: ParseWords:

#!/usr/bin/perl

use strict; use warnings;
use Text::ParseWords;

my @words = parse_line('\s+', 0, "abcd efgh 'ijklm no pqrs' tuv");

use Data::Dumper;
print Dumper \@words;

Output:

输出:

C:\Temp> ff
$VAR1 = [
          'abcd',
          'efgh',
          'ijklm no pqrs',
          'tuv'
        ];

You can look at the source code for Text::ParseWords::parse_line to see the pattern used.

您可以查看Text :: ParseWords :: parse_line的源代码以查看使用的模式。

#2


3  

use strict; use warnings;

my $text = "abcd efgh 'ijklm no pqrs' tuv 'xwyz 1234 9999' 'blah'";
my @out;

my @parts = split /'/, $text;

for ( my $i = 1; $i < $#parts; $i += 2 ) {
    push @out, split( /\s+/, $parts[$i - 1] ), $parts[$i];
}

push @out, $parts[-1];

use Data::Dumper;
print Dumper \@out;

#3


2  

So you've decided to use a regex? Now you have two problems.

所以你决定使用正则表达式?现在你有两个问题。

Allow me to infer a little bit. You want an arbitrary number of fields, where a field is composed of text without containing a space, or it is separated by spaces and begins with a quote and ends with a quote (possibly with spaces inbetween).

请允许我推断一下。您需要任意数量的字段,其中字段由不包含空格的文本组成,或者由空格分隔并以引号开头并以引号结尾(可能在中间有空格)。

In other words, you want to do what a command line shell does. You really should just reuse something. Failing that, you should capture a field at a time, with a regex something like:

换句话说,您希望执行命令行shell所执行的操作。你真的应该重用一些东西。如果不这样做,你应该一次捕获一个字段,使用正则表达式:

^ *([^ ]+|'[^']*')(.*)

Where you append group one to your list, and continue the loop with the contents of group 2.

将组1追加到列表中的位置,并继续使用组2的内容循环。

A single pass through a regex wouldn't be able to capture an arbitrarily large number of fields. You might be able to split on a regex (python will do this, not sure about perl), but since you are matching the stuff outside the spaces, I'm not sure that is even an option.

单次通过正则表达式将无法捕获任意大量的字段。你可能能够拆分正则表达式(python会这样做,不确定perl),但由于你匹配空间之外的东西,我不确定这是一个选项。

#1


15  

Use Text::ParseWords:

使用Text :: ParseWords:

#!/usr/bin/perl

use strict; use warnings;
use Text::ParseWords;

my @words = parse_line('\s+', 0, "abcd efgh 'ijklm no pqrs' tuv");

use Data::Dumper;
print Dumper \@words;

Output:

输出:

C:\Temp> ff
$VAR1 = [
          'abcd',
          'efgh',
          'ijklm no pqrs',
          'tuv'
        ];

You can look at the source code for Text::ParseWords::parse_line to see the pattern used.

您可以查看Text :: ParseWords :: parse_line的源代码以查看使用的模式。

#2


3  

use strict; use warnings;

my $text = "abcd efgh 'ijklm no pqrs' tuv 'xwyz 1234 9999' 'blah'";
my @out;

my @parts = split /'/, $text;

for ( my $i = 1; $i < $#parts; $i += 2 ) {
    push @out, split( /\s+/, $parts[$i - 1] ), $parts[$i];
}

push @out, $parts[-1];

use Data::Dumper;
print Dumper \@out;

#3


2  

So you've decided to use a regex? Now you have two problems.

所以你决定使用正则表达式?现在你有两个问题。

Allow me to infer a little bit. You want an arbitrary number of fields, where a field is composed of text without containing a space, or it is separated by spaces and begins with a quote and ends with a quote (possibly with spaces inbetween).

请允许我推断一下。您需要任意数量的字段,其中字段由不包含空格的文本组成,或者由空格分隔并以引号开头并以引号结尾(可能在中间有空格)。

In other words, you want to do what a command line shell does. You really should just reuse something. Failing that, you should capture a field at a time, with a regex something like:

换句话说,您希望执行命令行shell所执行的操作。你真的应该重用一些东西。如果不这样做,你应该一次捕获一个字段,使用正则表达式:

^ *([^ ]+|'[^']*')(.*)

Where you append group one to your list, and continue the loop with the contents of group 2.

将组1追加到列表中的位置,并继续使用组2的内容循环。

A single pass through a regex wouldn't be able to capture an arbitrarily large number of fields. You might be able to split on a regex (python will do this, not sure about perl), but since you are matching the stuff outside the spaces, I'm not sure that is even an option.

单次通过正则表达式将无法捕获任意大量的字段。你可能能够拆分正则表达式(python会这样做,不确定perl),但由于你匹配空间之外的东西,我不确定这是一个选项。