I've a text like the following
我有如下文字
-
This is a first question and can go to multiple paragraphs. Multiple lines. etc.
(1)First Option (2) Second Option (3) Third option (4) Fourth Option (5) None of these这是第一个问题,可以分为多个段落。多行。 (1)第一选择(2)第二选择(3)第三选项(4)第四选项(5)这些都不是
-
8 × ? = 4888 ÷ 4
(1) 150.75 (2) 125.75 (3) 125.05 (4) 152.75 (5) None of these8×? = 4888÷4(1)150.75(2)125.75(3)125.05(4)152.75(5)这些都不是
-
(62.5 × 14 × 5) ÷ 25 + 41 =
(1) 4 (2) 5 (3) 9 (4) 8 (5) 6(62.5×14×5)÷25 + 41 =(1)4(2)5(3)9(4)8(5)6
-
(23 × 23 × 23 × 23 × 23 × 23)×
(1) 32 (2) 30 (3) 9 (4) 7 (5) 11(23×23×23×23×23×23)×(1)32(2)30(3)9(4)7(5)11
I would like to parse this into different parts so that I can iterate in a for loop and get each question and also iterate over each answers. The rule is that every question will start with an integer at the start of line (^) followed by a dot. The answers will be prefixed by integers 1 to 5 surrounded by brackets (1-5).
我想将其解析为不同的部分,以便我可以迭代for循环并获得每个问题并迭代每个答案。规则是每个问题都以行(^)开头的整数开头,后跟一个点。答案将以括号(1-5)括起的整数1到5作为前缀。
I would like the parsed data say for ex something like:
我希望解析后的数据可以代表:
for item in parsed_data:
print item.text
for answer in item.answers:
print answer.text
How to do this using python regex?
如何使用python正则表达式执行此操作?
1 个解决方案
#1
1
honestly, you can just use re.split()
for this:
说实话,你可以使用re.split():
#text is the variable with your text
text = text.strip()
questions = re.split(r'\d+\.',text)
questions = [x.strip() for x in questions if x != '']
final = [re.split(r'\(\d+\)',x) for x in questions]
for part in final:
question = part[0]
print question
for answer in part[1:]:
print answer
#1
1
honestly, you can just use re.split()
for this:
说实话,你可以使用re.split():
#text is the variable with your text
text = text.strip()
questions = re.split(r'\d+\.',text)
questions = [x.strip() for x in questions if x != '']
final = [re.split(r'\(\d+\)',x) for x in questions]
for part in final:
question = part[0]
print question
for answer in part[1:]:
print answer