Python提取3个单词之前和3个单词后的特定单词列表与正则表达式

时间:2022-09-13 16:49:02

I need to use python to extract 3 words before and 3 words after a specific list of words

我需要使用python来提取3个单词之前和3个单词之后的特定单词列表

Nokia Lumia 930 Smartphone, Display 5 pollici, Fotocamera 20 MP, 2GB RAM, Processore Quad-Core 2,2GHz, Memoria 32GB, Windows Phone 8.1, Bianco [Germania]

诺基亚Lumia 930智能手机,显示器5 pollici,Fotocamera 20 MP,2GB RAM,处理器四核2.2GHz,Memoria 32GB,Windows Phone 8.1,Bianco [Germania]

At the moment I'm using this regex without success

目前我正在使用这个正则表达式而没有成功

((?:[\S,]+\s+){0,3})ram\s+((?:[\S,]+\s*){0,3})

https://regex101.com/r/yN6iI0/1

My list of words that I need is:

我需要的单词列表是:

  • Display
  • Fotocamera
  • RAM
  • Processore
  • Memoria

2 个解决方案

#1


1  

You regex did not work because \s+ requires at least 1 whitespace, but between RAM and , there is none. Either use a * quantifier or just remove it and use ``

你的正则表达式没有工作,因为\ s +需要至少1个空格,但在RAM和之间没有。要么使用*量词,要么删除它并使用``

(?i)((?:\S+\s+){0,3})\bRAM\b\s*((?:\S+\s+){0,3})

See demo

I added \b (word boundary) to make sure we match RAM, not RAMBUS.

我添加\ b(字边界)以确保我们匹配RAM,而不是RAMBUS。

Mind the re.I modifier (or use an inline version (?i) at the beginning of the pattern).

注意re.I修饰符(或在模式的开头使用内联版本(?i))。

Other patterns can be formed in a similar way, just replace RAM with the words from your list.

其他模式可以以类似的方式形成,只需用列表中的单词替换RAM。

#2


1  

((?:[\S,]+\s+){0,3})ram,?\s+((?:[\S,]+\s*){0,3})

                       ^^

Just add a ,.See demo.

只需添加一个,参见演示。

https://regex101.com/r/yN6iI0/4

You can use this finally,

你终于可以用了,

((?:[\S,]+\s+){0,3})(?:ram|Display|Fotocamera|RAM|Processore|Memoria),?\s+((?:[\S,]+\s*){0,3})

#1


1  

You regex did not work because \s+ requires at least 1 whitespace, but between RAM and , there is none. Either use a * quantifier or just remove it and use ``

你的正则表达式没有工作,因为\ s +需要至少1个空格,但在RAM和之间没有。要么使用*量词,要么删除它并使用``

(?i)((?:\S+\s+){0,3})\bRAM\b\s*((?:\S+\s+){0,3})

See demo

I added \b (word boundary) to make sure we match RAM, not RAMBUS.

我添加\ b(字边界)以确保我们匹配RAM,而不是RAMBUS。

Mind the re.I modifier (or use an inline version (?i) at the beginning of the pattern).

注意re.I修饰符(或在模式的开头使用内联版本(?i))。

Other patterns can be formed in a similar way, just replace RAM with the words from your list.

其他模式可以以类似的方式形成,只需用列表中的单词替换RAM。

#2


1  

((?:[\S,]+\s+){0,3})ram,?\s+((?:[\S,]+\s*){0,3})

                       ^^

Just add a ,.See demo.

只需添加一个,参见演示。

https://regex101.com/r/yN6iI0/4

You can use this finally,

你终于可以用了,

((?:[\S,]+\s+){0,3})(?:ram|Display|Fotocamera|RAM|Processore|Memoria),?\s+((?:[\S,]+\s*){0,3})