从给定字符串中删除包含数字的单词

时间:2022-09-13 11:41:42

I'm trying to write a simple program that removes all words containing digits from a received string.

我正在尝试编写一个简单的程序,删除包含接收字符串中的数字的所有单词。

Here is my current implementation:

这是我目前的实施:

import re

def checkio(text):

    text = text.replace(",", " ").replace(".", " ") .replace("!", " ").replace("?", " ").lower()
    counter = 0
    words = text.split()

    print words

    for each in words:
        if bool(re.search(r'\d', each)):
            words.remove(each)

    print words

checkio("1a4 4ad, d89dfsfaj.")

However, when I execute this program, I get the following output:

但是,当我执行此程序时,我得到以下输出:

['1a4', '4ad', 'd89dfsfaj']
['4ad']

I can't figure out why '4ad' is printed in the second line as it contains digits and should have been removed from the list. Any ideas?

我无法弄清楚为什么'4ad'在第二行打印,因为它包含数字并且应该从列表中删除。有任何想法吗?

4 个解决方案

#1


Assuming that your regular expression does what you want, you can do this to avoid removing while iterating.

假设您的正则表达式符合您的要求,您可以执行此操作以避免在迭代时删除。

import re

def checkio(text):

    text = re.sub('[,\.\?\!]', ' ', text).lower()
    words = [w for w in text.split() if not re.search(r'\d', w)]
    print words ## prints [] in this case

Also, note that I simplified your text = text.replace(...) line.

另外,请注意我简化了text = text.replace(...)行。

Additionally, if you do not need to reuse your text variable, you can use regex to split it directly.

此外,如果您不需要重用文本变量,则可以使用正则表达式直接拆分它。

import re

def checkio(text):

    words = [w for w in re.split('[,.?!]', text.lower()) if w and not re.search(r'\d', w)]
    print words ## prints [] in this case

#2


If you are testing for alpha numeric strings why not use isalnum() instead of regex ?

如果您正在测试字母数字字符串,为什么不使用isalnum()而不是正则表达式?

In [1695]: x = ['1a4', '4ad', 'd89dfsfaj']

In [1696]: [word for word in x if not word.isalnum()]
Out[1696]: []

#3


This would be possible through using re.sub, re.search and list_comprehension.

这可以通过使用re.sub,re.search和list_comprehension来实现。

>>> import re
>>> def checkio(s):
        print([i for i in re.sub(r'[.,!?]', '', s.lower()).split() if not re.search(r'\d', i)])


>>> checkio("1a4 4ad, d89dfsfaj.")
[]
>>> checkio("1a4 ?ad, d89dfsfaj.")
['ad']

#4


So apparently what happens is a concurrent access error. Namely - you are deleting an element while traversing the array.

显然,发生的是并发访问错误。即 - 您在遍历数组时删除元素。

At the first iteration we have words = ['1a4', '4ad', 'd89dfsfaj']. Since '1a4' has a number, we remove it. Now, words = ['4ad','d89dfsfaj']. However, at the second iteration, the current word is now 'd89dfsfaj' and we remove it. What happens is that we skip '4ad', because it is now at index 0 and the current pointer for the for cycle is at 1.

在第一次迭代中,我们有单词= ['1a4','4ad','d89dfsfaj']。由于'1a4'有一个数字,我们将其删除。现在,words = ['4ad','d89dfsfaj']。但是,在第二次迭代中,当前单词现在是'd89dfsfaj',我们将其删除。发生的事情是我们跳过'4ad',因为它现在位于索引0并且for循环的当前指针为1。

#1


Assuming that your regular expression does what you want, you can do this to avoid removing while iterating.

假设您的正则表达式符合您的要求,您可以执行此操作以避免在迭代时删除。

import re

def checkio(text):

    text = re.sub('[,\.\?\!]', ' ', text).lower()
    words = [w for w in text.split() if not re.search(r'\d', w)]
    print words ## prints [] in this case

Also, note that I simplified your text = text.replace(...) line.

另外,请注意我简化了text = text.replace(...)行。

Additionally, if you do not need to reuse your text variable, you can use regex to split it directly.

此外,如果您不需要重用文本变量,则可以使用正则表达式直接拆分它。

import re

def checkio(text):

    words = [w for w in re.split('[,.?!]', text.lower()) if w and not re.search(r'\d', w)]
    print words ## prints [] in this case

#2


If you are testing for alpha numeric strings why not use isalnum() instead of regex ?

如果您正在测试字母数字字符串,为什么不使用isalnum()而不是正则表达式?

In [1695]: x = ['1a4', '4ad', 'd89dfsfaj']

In [1696]: [word for word in x if not word.isalnum()]
Out[1696]: []

#3


This would be possible through using re.sub, re.search and list_comprehension.

这可以通过使用re.sub,re.search和list_comprehension来实现。

>>> import re
>>> def checkio(s):
        print([i for i in re.sub(r'[.,!?]', '', s.lower()).split() if not re.search(r'\d', i)])


>>> checkio("1a4 4ad, d89dfsfaj.")
[]
>>> checkio("1a4 ?ad, d89dfsfaj.")
['ad']

#4


So apparently what happens is a concurrent access error. Namely - you are deleting an element while traversing the array.

显然,发生的是并发访问错误。即 - 您在遍历数组时删除元素。

At the first iteration we have words = ['1a4', '4ad', 'd89dfsfaj']. Since '1a4' has a number, we remove it. Now, words = ['4ad','d89dfsfaj']. However, at the second iteration, the current word is now 'd89dfsfaj' and we remove it. What happens is that we skip '4ad', because it is now at index 0 and the current pointer for the for cycle is at 1.

在第一次迭代中,我们有单词= ['1a4','4ad','d89dfsfaj']。由于'1a4'有一个数字,我们将其删除。现在,words = ['4ad','d89dfsfaj']。但是,在第二次迭代中,当前单词现在是'd89dfsfaj',我们将其删除。发生的事情是我们跳过'4ad',因为它现在位于索引0并且for循环的当前指针为1。