perl - regex - 获取字符串后的所有文本

时间:2023-02-05 20:30:57

Using Perl Regex how do I get the IWantThisText text block that comes after 'base64' from the following:

使用Perl Regex如何从以下内容获取'base64'之后的IWantThisText文本块:

Content-Type: text/html; charset="KOI8-R"  
Content-Disposition: inline  
Content-Transfer-Encoding: base64  

IWantThisTextIWantThisTextIWantThisTextIWantThisTextIWantThisTextIWantThisTex
tIWantThisTextIWantThisTextIWantThisTextIWantThisTextIWantThisTextIWantThisTe
xtIWantThisTextIWantThisTextIWantThisTextIWantThisTextIWantThisTextIWantThisT
extIWantThisTextIWantThisTextIWantThisTextIWantThisTextIWantThisTextIWantThis
TextIWantThisTextIWantThisTextIWantThisTextIWantThisTextIWantThisTextIWantThi
sTextIWantThi

EDIT:
What I have so far:

编辑:到目前为止我所拥有的:

my ($textIWant) = $textblock =~ m/base64(.*?)/;

2 个解决方案

#1


0  

If you have the whole of the file in variable $textblock then you can extract everything after the first occurrence of base64 by removing the ? non-greedy modifier from (.*?) and adding the /s modifier to your sample code.

如果您将整个文件放在变量$ textblock中,那么您可以在第一次出现base64之后通过删除?来提取所有内容。来自(。*?)的非贪婪修饰符,并将/ s修饰符添加到示例代码中。

The difference is that . normally matches anything except the newline character at the end of the string, so (.*) will stop at the end of the line containing base64. Adding /s changes it to match any character at all

不同之处在于。通常匹配除字符串末尾的换行符之外的任何内容,因此(。*)将停止在包含base64的行的末尾。添加/ s会更改它以匹配任何字符

my $text_i_want;
$text_i_want = $1 if $textblock =~ /base64(.*)/s;

will give you what you want, but note that it includes any space characters and the newline after base64

会给你你想要的东西,但请注意它包含任何空格字符和base64之后的换行符

As an alternative, you can split the string into two at the first occurrence of base64 and select the second part, like this

作为替代方案,您可以在第一次出现base64时将字符串拆分为两个,并选择第二个部分,如下所示

my $text_i_want = (split /base64/, $textblock, 2)[1];

which gives the same result

这给出了相同的结果

#2


2  

You want the body of a MIME message. The body is separated from the header by a blank line. So, just check for two line breaks in a row.

您想要MIME消息的正文。身体通过空白线与标题分开。所以,只需连续检查两个换行符。

my ($body) = $mime_message =~ /\n\r?\n(.*)/s;

That handles the standard CRLF line break used by MIME, but it also handles just LF too.

它处理MIME使用的标准CRLF换行符,但它也只处理LF。

#1


0  

If you have the whole of the file in variable $textblock then you can extract everything after the first occurrence of base64 by removing the ? non-greedy modifier from (.*?) and adding the /s modifier to your sample code.

如果您将整个文件放在变量$ textblock中,那么您可以在第一次出现base64之后通过删除?来提取所有内容。来自(。*?)的非贪婪修饰符,并将/ s修饰符添加到示例代码中。

The difference is that . normally matches anything except the newline character at the end of the string, so (.*) will stop at the end of the line containing base64. Adding /s changes it to match any character at all

不同之处在于。通常匹配除字符串末尾的换行符之外的任何内容,因此(。*)将停止在包含base64的行的末尾。添加/ s会更改它以匹配任何字符

my $text_i_want;
$text_i_want = $1 if $textblock =~ /base64(.*)/s;

will give you what you want, but note that it includes any space characters and the newline after base64

会给你你想要的东西,但请注意它包含任何空格字符和base64之后的换行符

As an alternative, you can split the string into two at the first occurrence of base64 and select the second part, like this

作为替代方案,您可以在第一次出现base64时将字符串拆分为两个,并选择第二个部分,如下所示

my $text_i_want = (split /base64/, $textblock, 2)[1];

which gives the same result

这给出了相同的结果

#2


2  

You want the body of a MIME message. The body is separated from the header by a blank line. So, just check for two line breaks in a row.

您想要MIME消息的正文。身体通过空白线与标题分开。所以,只需连续检查两个换行符。

my ($body) = $mime_message =~ /\n\r?\n(.*)/s;

That handles the standard CRLF line break used by MIME, but it also handles just LF too.

它处理MIME使用的标准CRLF换行符,但它也只处理LF。