从字符串中提取键和值

时间:2023-02-05 20:49:48

this is probably very easy but I feel I am doing it wrong. Let's say I have the following string:

这可能很容易,但我觉得我做错了。假设我有以下字符串:

user: bob status: married age:45

Now I want to break it down to something like:

现在我想把它分解成:

user = 'bob'
status ='married'
age = 45

At the moment I am doing a lot of dirty splitting work but there's gotta be a better, Pythonic way using Regex. Here's what I do:

目前,我正在做大量肮脏的拆分工作,但肯定有更好的、python化的使用Regex的方法。这是我做的:

full_text = 'user: bob status: married age:45'
type = 'user'        
cut_string = full_text_string.split(type + ":", 1)[1].split(" ")[0]

Thanks!

谢谢!

3 个解决方案

#1


3  

Here's my solution. The regex : (\w+)\s*:\s*((?:\w+\b\s*)+)(?!\s*:)

这是我的解决方案。regex:(\ w +)\ s *:\ s *((?:\ w + \ b \ s *)+)(? ! \ s *:)

import re 

s = 'user: bob status: married with children age:45'

pat = re.compile(r'(\w+)\s*:\s*((?:\w+\b\s*)+)(?!\s*:)')

print(pat.findall(s))

prints

打印

[('user', 'bob '), ('status', 'married with children '), ('age', '45')]

You can then use something like ast.literal_eval to get the types right

然后,您可以使用类似ast.literal_eval这样的东西来正确地获取类型

#2


0  

re.findall(r'(?:([0-9a-zA-Z]+): ?([0-9a-zA-Z]+))+',s)

re.findall(r '(?:([0-9a-zA-Z]+):?([0-9a-zA-Z]+))+ ',)

This will give back: [('user', 'bob'), ('status', 'married'), ('age', '45')]

这将回馈:[(“用户”,“鲍勃”)(“状态”,“结婚”)(“年龄”、“45”)]

The first group is a non-capturing group it means that this won't be in the results of findall.

第一个组是一个非捕获组,这意味着它不会出现在findall的结果中。

The [0-9a-z-A-Z] part is equivalent to \w.

[0-9a-z-A-Z]部分相当于\w。

#3


0  

For those of us who avoid regex if we possibly can:

对于我们中那些尽可能避免使用regex的人:

>>> full_text='user: bob status: married age:45'
>>> alt_text = full_text.replace(':',' ').split()
>>> print alt_text[0],"=",alt_text[1]
>>> print alt_text[2],"=",alt_text[3]
>>> print alt_text[4],"=",alt_text[5]
user = bob
status = married
age = 45

If you had a space between age: and 45 you wouldn't have to use replace just full_text.split() would suffice.

如果您在年龄之间有一个空格:到45岁之间,您不必使用replace,只需full_text.split()即可。

#1


3  

Here's my solution. The regex : (\w+)\s*:\s*((?:\w+\b\s*)+)(?!\s*:)

这是我的解决方案。regex:(\ w +)\ s *:\ s *((?:\ w + \ b \ s *)+)(? ! \ s *:)

import re 

s = 'user: bob status: married with children age:45'

pat = re.compile(r'(\w+)\s*:\s*((?:\w+\b\s*)+)(?!\s*:)')

print(pat.findall(s))

prints

打印

[('user', 'bob '), ('status', 'married with children '), ('age', '45')]

You can then use something like ast.literal_eval to get the types right

然后,您可以使用类似ast.literal_eval这样的东西来正确地获取类型

#2


0  

re.findall(r'(?:([0-9a-zA-Z]+): ?([0-9a-zA-Z]+))+',s)

re.findall(r '(?:([0-9a-zA-Z]+):?([0-9a-zA-Z]+))+ ',)

This will give back: [('user', 'bob'), ('status', 'married'), ('age', '45')]

这将回馈:[(“用户”,“鲍勃”)(“状态”,“结婚”)(“年龄”、“45”)]

The first group is a non-capturing group it means that this won't be in the results of findall.

第一个组是一个非捕获组,这意味着它不会出现在findall的结果中。

The [0-9a-z-A-Z] part is equivalent to \w.

[0-9a-z-A-Z]部分相当于\w。

#3


0  

For those of us who avoid regex if we possibly can:

对于我们中那些尽可能避免使用regex的人:

>>> full_text='user: bob status: married age:45'
>>> alt_text = full_text.replace(':',' ').split()
>>> print alt_text[0],"=",alt_text[1]
>>> print alt_text[2],"=",alt_text[3]
>>> print alt_text[4],"=",alt_text[5]
user = bob
status = married
age = 45

If you had a space between age: and 45 you wouldn't have to use replace just full_text.split() would suffice.

如果您在年龄之间有一个空格:到45岁之间,您不必使用replace,只需full_text.split()即可。