从字符串中提取数据,省略一个模式

时间:2022-06-02 00:19:14

I'm totally new to regular expressions and I'm trying to get something like this:

我对正则表达式完全不熟悉,我试图得到这样的东西:

["Group", "s1", "s2", "Group2"]

from a string:

从一个字符串:

string = "_GRP_Group||s1||s2||Group2||"

All I have now is:

我现在所拥有的只是:

word = re.findall(r'([^\|]+)', string)

which just leaves out the pipe and I get this:

刚刚离开管道,我得到了这个:

['_GRP_Group', 's1', 's2', 'Group2']

Is there a way to get rid of the _GRP_ prefix?

有没有办法摆脱_GRP_前缀?

2 个解决方案

#1


2  

Based on your comments on other answers, it sounds like the _GRP_ prefix is a prefix to the string rather than to each individual split value?

根据您对其他答案的评论,听起来_GRP_前缀是字符串的前缀而不是每个单独的分割值?

Try this:

尝试这个:

string = "_GRP_Group||s1||s2||Group2||"
word = re.findall(r"(?:_GRP_)?([^|]+)", string)

#2


1  

You don't need to use regular expressions to split the first string by || or remove the prefix _GRP_. You can just use split and slicing:

您不需要使用正则表达式将第一个字符串拆分为||或删除前缀_GRP_。您可以使用拆分和切片:

words = "_GRP_Group||s1||s2||Group2||"[5:].split('||')

The slice [5:] will exclude the first five characters from the string.
If you didn't know where _GRP_ would occur, you could use replace:

切片[5:]将从字符串中排除前五个字符。如果你不知道_GRP_会出现在哪里,你可以使用replace:

words = "_GRP_Group||s1||s2||Group2||".split('||')
words = [word.replace("_GRP_", "") for word in words]

#1


2  

Based on your comments on other answers, it sounds like the _GRP_ prefix is a prefix to the string rather than to each individual split value?

根据您对其他答案的评论,听起来_GRP_前缀是字符串的前缀而不是每个单独的分割值?

Try this:

尝试这个:

string = "_GRP_Group||s1||s2||Group2||"
word = re.findall(r"(?:_GRP_)?([^|]+)", string)

#2


1  

You don't need to use regular expressions to split the first string by || or remove the prefix _GRP_. You can just use split and slicing:

您不需要使用正则表达式将第一个字符串拆分为||或删除前缀_GRP_。您可以使用拆分和切片:

words = "_GRP_Group||s1||s2||Group2||"[5:].split('||')

The slice [5:] will exclude the first five characters from the string.
If you didn't know where _GRP_ would occur, you could use replace:

切片[5:]将从字符串中排除前五个字符。如果你不知道_GRP_会出现在哪里,你可以使用replace:

words = "_GRP_Group||s1||s2||Group2||".split('||')
words = [word.replace("_GRP_", "") for word in words]