将字符串拆分为重复元素的字符串

时间:2022-07-29 21:41:52

I want to split a string like:

我想分割一个字符串,如:

'aaabbccccabbb'

into

['aaa', 'bb', 'cccc', 'a', 'bbb']

What's an elegant way to do this in Python? If it makes it easier, it can be assumed that the string will only contain a's, b's and c's.

在Python中执行此操作的优雅方法是什么?如果它更容易,可以假设字符串只包含a,b和c。

4 个解决方案

#1


26  

That is the use case for itertools.groupby :)

这是itertools.groupby :)的用例

>>> from itertools import groupby
>>> s = 'aaabbccccabbb'
>>> [''.join(y) for _,y in groupby(s)]
['aaa', 'bb', 'cccc', 'a', 'bbb']

#2


3  

You can create an iterator - without trying to be smart just to keep it short and unreadable:

你可以创建一个迭代器 - 不要试图变得聪明只是为了保持简短和不可读:

def yield_same(string):
    it_str = iter(string)
    result = it_str.next()
    for next_chr in it_str:
        if next_chr != result[0]:
            yield result
            result = ""
        result += next_chr
    yield result


.. 
>>> list(yield_same("aaaaaabcbcdcdccccccdddddd"))
['aaaaaa', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'cccccc', 'dddddd']
>>> 

edit ok, so there is itertools.groupby, which probably does something like this.

编辑确定,所以有itertools.groupby,它可能会做这样的事情。

#3


2  

Here's the best way I could find using regex:

这是我使用正则表达式找到的最好方法:

print [a for a,b in re.findall(r"((\w)\2*)", s)]

#4


1  

>>> import re
>>> s = 'aaabbccccabbb'
>>> [m.group() for m in re.finditer(r'(\w)(\1*)',s)]
['aaa', 'bb', 'cccc', 'a', 'bbb']

#1


26  

That is the use case for itertools.groupby :)

这是itertools.groupby :)的用例

>>> from itertools import groupby
>>> s = 'aaabbccccabbb'
>>> [''.join(y) for _,y in groupby(s)]
['aaa', 'bb', 'cccc', 'a', 'bbb']

#2


3  

You can create an iterator - without trying to be smart just to keep it short and unreadable:

你可以创建一个迭代器 - 不要试图变得聪明只是为了保持简短和不可读:

def yield_same(string):
    it_str = iter(string)
    result = it_str.next()
    for next_chr in it_str:
        if next_chr != result[0]:
            yield result
            result = ""
        result += next_chr
    yield result


.. 
>>> list(yield_same("aaaaaabcbcdcdccccccdddddd"))
['aaaaaa', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'cccccc', 'dddddd']
>>> 

edit ok, so there is itertools.groupby, which probably does something like this.

编辑确定,所以有itertools.groupby,它可能会做这样的事情。

#3


2  

Here's the best way I could find using regex:

这是我使用正则表达式找到的最好方法:

print [a for a,b in re.findall(r"((\w)\2*)", s)]

#4


1  

>>> import re
>>> s = 'aaabbccccabbb'
>>> [m.group() for m in re.finditer(r'(\w)(\1*)',s)]
['aaa', 'bb', 'cccc', 'a', 'bbb']