用\ n替换带引号的字符串中的换行符

时间:2022-07-20 09:36:52

I need to write a quick (by tomorrow) filter script to replace line breaks (LF or CRLF) found within double quoted strings by the escaped newline \n. The content is a (broken) javascript program, so I need to allow for escape sequences like "ab\"cd" and "ab\\"cd"ef" within a string.

我需要编写一个快速(明天)过滤器脚本来替换转义换行符在双引号字符串中找到的换行符(LF或CRLF)\ n。内容是一个(破碎的)javascript程序,所以我需要在字符串中允许转义序列,如“ab \”cd“和”ab \\“cd”ef“。

I understand that sed is not well-suited for the job as it work per line, so I turn to perl, of which I know nothing :)

我知道sed并不适合这项工作,因为它每行工作,所以我转向perl,其中我什么都不知道:)

I've written this regex: "(((\\.)|[^"\\\n])*\n?)*" and tested it with the http://regex.powertoy.org. It indeed matches quoted strings with line breaks, however, perl -p -e 's/"(((\\.)|[^"\\\n])*(\n)?)*"/TEST/g' does not.

我写了这个正则表达式:“(((\\。)| [^”\\\ n])* \ n?)*“并用http://regex.powertoy.org测试它。它确实匹配引用带换行符的字符串,但是,perl -p -e's /“(((\\。)| [^”\\\ n])*(\ n)?)*“/ TEST / g'不会。

So my questions are:

所以我的问题是:

  1. how to make perl to match line breaks?
  2. 如何使perl匹配换行符?

  3. how to write the "replace-by" part so that it keeps the original string and only replaces newlines?
  4. 如何编写“替换”部分,以便保留原始字符串并仅替换换行符?

There is this similar question with awk solution, but it is not quite what I need.

awk解决方案有类似的问题,但它并不是我需要的。

NOTE: I usually don't ask "please do this for me" questions, but I really don't feel like learning perl/awk by tomorrow... :)

注意:我通常不会问“请为我做这个”问题,但我真的不想明天学习perl / awk ...... :)

EDIT: sample data

编辑:样本数据

"abc\"def" - matches as one string
"abc\\"def"xy" - match "abcd\\" and "xy"
"ab
cd
ef" - is replaced by "ab\ncd\nef"

4 个解决方案

#1


2  

Here is a simple Perl solution:

这是一个简单的Perl解决方案:

s§
    \G # match from the beginning of the string or the last match
    ([^"]*+) # till we get to a quote
    "((?:[^"\\]++|\\.)*+)" # match the whole quote
§
    $a = $1;
    $b = $2;
    $b =~ s/\r?\n/\\n/g; # replace what you want inside the quote
    "$a\"$b\"";
§gex;

Here is another solution in case you wouldn't want to use /e and just do it with one regex:

这是另一个解决方案,如果您不想使用/ e并且只使用一个正则表达式:

use strict;

$_=<<'_quote_';
hai xtest "aa xx aax" baix "xx"
x "axa\"x\\" xa "x\\\\\"x" ax
xbai!x
_quote_

print "Original:\n", $_, "\n";

s/
(
    (?:
        # at the beginning of the string match till inside the quotes
        ^(?&outside_quote) "
        # or continue from last match which always stops inside quotes
        | (?!^)\G
    )
    (?&inside_quote)  # eat things up till we find what we want
)
x   # the thing we want to replace
(
    (?&inside_quote)  # eat more possibly till end of quote
    # if going out of quote make sure the match stops inside them
    # or at the end of string
    (?: " (?&outside_quote) (?:"|\z) )?
)

(?(DEFINE)
    (?<outside_quote> [^"]*+ ) # just eat everything till quoting starts
    (?<inside_quote> (?:[^"\\x]++|\\.)*+ ) # handle escapes
)
/$1Y$2/xg;

print "Replaced:\n", $_, "\n";

Output:

Original:
hai xtest "aa xx aax" baix "xx"
x "axa\"x\\" xa "x\\\\\"x" ax
xbai!x

Replaced:
hai xtest "aa YY aaY" baix "YY"
x "aYa\"Y\\" xa "Y\\\\\"Y" ax
xbai!x

To work with line breaks instead of x, just replace it in the regex like so:

要使用换行符而不是x,只需在正则表达式中替换它,如下所示:

s/
(
    (?:
        # at the beginning of the string match till inside the quotes
        ^(?&outside_quote) "
        # or continue from last match which always stops inside quotes
        | (?!^)\G
    )
    (?&inside_quote)  # eat things up till we find what we want
)
\r?\n # the thing we want to replace
(
    (?&inside_quote)  # eat more possibly till end of quote
    # if going out of quote make sure the match stops inside them
    # or at the end of string
    (?: " (?&outside_quote) (?:"|\z) )?
)

(?(DEFINE)
    (?<outside_quote> [^"]*+ ) # just eat everything till quoting starts
    (?<inside_quote> (?:[^"\\\r\n]++|\\.)*+ ) # handle escapes
)
/$1\\n$2/xg;

#2


1  

Until the OP posts some example content to test by, try adding the "m" (and possibly the "s") flag to the end of your regex; from perldoc perlreref (reference):

在OP发布一些示例内容进行测试之前,请尝试将“m”(可能还有“s”)标记添加到正则表达式的末尾;来自perldoc perlreref(参考):

m  Multiline mode - ^ and $ match internal lines
s  match as a Single line - . matches \n

For testing you might also find that adding the command line argument "-i.bak" so that you keep a backup of the original file (now with the extension ".bak").

对于测试,您可能还会发现添加命令行参数“-i.bak”以便保留原始文件的备份(现在扩展名为“.bak”)。

Note also that if you want to capture but not store something you can use (?:PATTERN) rather than (PATTERN). Once you have your captured content use $1 through $9 to access stored matches from the matching section.

另请注意,如果您想捕获但不能存储可以使用的东西(?:PATTERN)而不是(PATTERN)。获取捕获的内容后,使用$ 1到$ 9来访问匹配部分中存储的匹配项。

For more info see the link about as well as perldoc perlretut (tutorial) and perldoc perlre (full-ish documentation)

有关更多信息,请参阅有关以及perldoc perlretut(教程)和perldoc perlre(完整文档)的链接

#3


1  

#!/usr/bin/perl
use warnings;
use strict;
use Regexp::Common;

$_ = '"abc\"def"' . '"abc\\\\"def"xy"' . qq("ab\ncd\nef");

print "befor: {{$_}}\n";
s{($RE{quoted})}
 {  (my $x=$1) =~ s/\n/\\n/g;
    $x
 }ge;
print "after: {{$_}}\n";

#4


1  

Using Perl 5.14.0 (install with perlbrew) one can do this:

使用Perl 5.14.0(使用perlbrew安装)可以这样做:

#!/usr/bin/env perl

use strict;
use warnings;

use 5.14.0;

use Regexp::Common qw/delimited/;

my $data = <<'END';
"abc\"def"
"abc\\"def"xy"
"ab
cd
ef"
END

my $output = $data =~ s/$RE{delimited}{-delim=>'"'}{-keep}/$1=~s!\n!\\n!rg/egr;

print $output;

I need 5.14.0 for the /r flag of the internal replace. If someone knows how to avoid this please let me know.

我需要5.14.0作为内部替换的/ r标志。如果有人知道如何避免这种情况,请告诉我。

#1


2  

Here is a simple Perl solution:

这是一个简单的Perl解决方案:

s§
    \G # match from the beginning of the string or the last match
    ([^"]*+) # till we get to a quote
    "((?:[^"\\]++|\\.)*+)" # match the whole quote
§
    $a = $1;
    $b = $2;
    $b =~ s/\r?\n/\\n/g; # replace what you want inside the quote
    "$a\"$b\"";
§gex;

Here is another solution in case you wouldn't want to use /e and just do it with one regex:

这是另一个解决方案,如果您不想使用/ e并且只使用一个正则表达式:

use strict;

$_=<<'_quote_';
hai xtest "aa xx aax" baix "xx"
x "axa\"x\\" xa "x\\\\\"x" ax
xbai!x
_quote_

print "Original:\n", $_, "\n";

s/
(
    (?:
        # at the beginning of the string match till inside the quotes
        ^(?&outside_quote) "
        # or continue from last match which always stops inside quotes
        | (?!^)\G
    )
    (?&inside_quote)  # eat things up till we find what we want
)
x   # the thing we want to replace
(
    (?&inside_quote)  # eat more possibly till end of quote
    # if going out of quote make sure the match stops inside them
    # or at the end of string
    (?: " (?&outside_quote) (?:"|\z) )?
)

(?(DEFINE)
    (?<outside_quote> [^"]*+ ) # just eat everything till quoting starts
    (?<inside_quote> (?:[^"\\x]++|\\.)*+ ) # handle escapes
)
/$1Y$2/xg;

print "Replaced:\n", $_, "\n";

Output:

Original:
hai xtest "aa xx aax" baix "xx"
x "axa\"x\\" xa "x\\\\\"x" ax
xbai!x

Replaced:
hai xtest "aa YY aaY" baix "YY"
x "aYa\"Y\\" xa "Y\\\\\"Y" ax
xbai!x

To work with line breaks instead of x, just replace it in the regex like so:

要使用换行符而不是x,只需在正则表达式中替换它,如下所示:

s/
(
    (?:
        # at the beginning of the string match till inside the quotes
        ^(?&outside_quote) "
        # or continue from last match which always stops inside quotes
        | (?!^)\G
    )
    (?&inside_quote)  # eat things up till we find what we want
)
\r?\n # the thing we want to replace
(
    (?&inside_quote)  # eat more possibly till end of quote
    # if going out of quote make sure the match stops inside them
    # or at the end of string
    (?: " (?&outside_quote) (?:"|\z) )?
)

(?(DEFINE)
    (?<outside_quote> [^"]*+ ) # just eat everything till quoting starts
    (?<inside_quote> (?:[^"\\\r\n]++|\\.)*+ ) # handle escapes
)
/$1\\n$2/xg;

#2


1  

Until the OP posts some example content to test by, try adding the "m" (and possibly the "s") flag to the end of your regex; from perldoc perlreref (reference):

在OP发布一些示例内容进行测试之前,请尝试将“m”(可能还有“s”)标记添加到正则表达式的末尾;来自perldoc perlreref(参考):

m  Multiline mode - ^ and $ match internal lines
s  match as a Single line - . matches \n

For testing you might also find that adding the command line argument "-i.bak" so that you keep a backup of the original file (now with the extension ".bak").

对于测试,您可能还会发现添加命令行参数“-i.bak”以便保留原始文件的备份(现在扩展名为“.bak”)。

Note also that if you want to capture but not store something you can use (?:PATTERN) rather than (PATTERN). Once you have your captured content use $1 through $9 to access stored matches from the matching section.

另请注意,如果您想捕获但不能存储可以使用的东西(?:PATTERN)而不是(PATTERN)。获取捕获的内容后,使用$ 1到$ 9来访问匹配部分中存储的匹配项。

For more info see the link about as well as perldoc perlretut (tutorial) and perldoc perlre (full-ish documentation)

有关更多信息,请参阅有关以及perldoc perlretut(教程)和perldoc perlre(完整文档)的链接

#3


1  

#!/usr/bin/perl
use warnings;
use strict;
use Regexp::Common;

$_ = '"abc\"def"' . '"abc\\\\"def"xy"' . qq("ab\ncd\nef");

print "befor: {{$_}}\n";
s{($RE{quoted})}
 {  (my $x=$1) =~ s/\n/\\n/g;
    $x
 }ge;
print "after: {{$_}}\n";

#4


1  

Using Perl 5.14.0 (install with perlbrew) one can do this:

使用Perl 5.14.0(使用perlbrew安装)可以这样做:

#!/usr/bin/env perl

use strict;
use warnings;

use 5.14.0;

use Regexp::Common qw/delimited/;

my $data = <<'END';
"abc\"def"
"abc\\"def"xy"
"ab
cd
ef"
END

my $output = $data =~ s/$RE{delimited}{-delim=>'"'}{-keep}/$1=~s!\n!\\n!rg/egr;

print $output;

I need 5.14.0 for the /r flag of the internal replace. If someone knows how to avoid this please let me know.

我需要5.14.0作为内部替换的/ r标志。如果有人知道如何避免这种情况,请告诉我。