如何在Python中将一个字符串附加到另一个字符串?

时间:2022-02-14 20:11:58

I want an efficient way to append one string to another in Python.

我想要一种在Python中把一个字符串附加到另一个字符串的有效方法。

var1 = "foo"
var2 = "bar"
var3 = var1 + var2

Is there any good built-in method to use?

有什么好的内置方法可以使用吗?

9 个解决方案

#1


468  

If you only have one reference to a string and you concatenate another string to the end, CPython now special cases this and tries to extend the string in place.

如果您只有一个对字符串的引用,并将另一个字符串连接到末尾,那么CPython就会有这种特殊情况,并尝试在适当的地方扩展该字符串。

The end result is that the operation is amortized O(n).

最终结果是操作被平摊为O(n)

e.g.

如。

s = ""
for i in range(n):
    s+=str(i)

used to be O(n^2), but now it is O(n).

曾经是O(n ^ 2),但现在它是O(n)。

From the source (bytesobject.c):

从源(bytesobject.c):

void
PyBytes_ConcatAndDel(register PyObject **pv, register PyObject *w)
{
    PyBytes_Concat(pv, w);
    Py_XDECREF(w);
}


/* The following function breaks the notion that strings are immutable:
   it changes the size of a string.  We get away with this only if there
   is only one module referencing the object.  You can also think of it
   as creating a new string object and destroying the old one, only
   more efficiently.  In any case, don't use this if the string may
   already be known to some other part of the code...
   Note that if there's not enough memory to resize the string, the original
   string object at *pv is deallocated, *pv is set to NULL, an "out of
   memory" exception is set, and -1 is returned.  Else (on success) 0 is
   returned, and the value in *pv may or may not be the same as on input.
   As always, an extra byte is allocated for a trailing \0 byte (newsize
   does *not* include that), and a trailing \0 byte is stored.
*/

int
_PyBytes_Resize(PyObject **pv, Py_ssize_t newsize)
{
    register PyObject *v;
    register PyBytesObject *sv;
    v = *pv;
    if (!PyBytes_Check(v) || Py_REFCNT(v) != 1 || newsize < 0) {
        *pv = 0;
        Py_DECREF(v);
        PyErr_BadInternalCall();
        return -1;
    }
    /* XXX UNREF/NEWREF interface should be more symmetrical */
    _Py_DEC_REFTOTAL;
    _Py_ForgetReference(v);
    *pv = (PyObject *)
        PyObject_REALLOC((char *)v, PyBytesObject_SIZE + newsize);
    if (*pv == NULL) {
        PyObject_Del(v);
        PyErr_NoMemory();
        return -1;
    }
    _Py_NewReference(*pv);
    sv = (PyBytesObject *) *pv;
    Py_SIZE(sv) = newsize;
    sv->ob_sval[newsize] = '\0';
    sv->ob_shash = -1;          /* invalidate cached hash value */
    return 0;
}

It's easy enough to verify empirically.

凭经验来验证是很容易的。

$ python -m timeit -s"s=''" "for i in xrange(10):s+='a'"
1000000 loops, best of 3: 1.85 usec per loop
$ python -m timeit -s"s=''" "for i in xrange(100):s+='a'"
10000 loops, best of 3: 16.8 usec per loop
$ python -m timeit -s"s=''" "for i in xrange(1000):s+='a'"
10000 loops, best of 3: 158 usec per loop
$ python -m timeit -s"s=''" "for i in xrange(10000):s+='a'"
1000 loops, best of 3: 1.71 msec per loop
$ python -m timeit -s"s=''" "for i in xrange(100000):s+='a'"
10 loops, best of 3: 14.6 msec per loop
$ python -m timeit -s"s=''" "for i in xrange(1000000):s+='a'"
10 loops, best of 3: 173 msec per loop

It's important however to note that this optimisation isn't part of the Python spec. It's only in the cPython implementation as far as I know. The same empirical testing on pypy or jython for example might show the older O(n**2) performance .

但是,重要的是要注意,这种优化不是Python规范的一部分,据我所知,它只是在cPython实现中。例如,在pypy或jython上进行的相同的经验测试可能会显示旧的O(n**2)性能。

$ pypy -m timeit -s"s=''" "for i in xrange(10):s+='a'"
10000 loops, best of 3: 90.8 usec per loop
$ pypy -m timeit -s"s=''" "for i in xrange(100):s+='a'"
1000 loops, best of 3: 896 usec per loop
$ pypy -m timeit -s"s=''" "for i in xrange(1000):s+='a'"
100 loops, best of 3: 9.03 msec per loop
$ pypy -m timeit -s"s=''" "for i in xrange(10000):s+='a'"
10 loops, best of 3: 89.5 msec per loop

So far so good, but then,

到目前为止还不错,但是,

$ pypy -m timeit -s"s=''" "for i in xrange(100000):s+='a'"
10 loops, best of 3: 12.8 sec per loop

ouch even worse than quadratic. So pypy is doing something that works well with short strings, but performs poorly for larger strings.

哎哟,比二次方程还糟。pypy在短弦上工作得很好,但在大弦上表现很差。

#2


231  

Don't prematurely optimize. If you have no reason to believe there's a speed bottleneck caused by string concatenations then just stick with + and +=:

不要过早的优化。如果您没有理由相信字符串连接会导致速度瓶颈,那么只需使用+和+=:

s  = 'foo'
s += 'bar'
s += 'baz'

That said, if you're aiming for something like Java's StringBuilder, the canonical Python idiom is to add items to a list and then use str.join to concatenate them all at the end:

也就是说,如果你的目标是像Java的StringBuilder,典型的Python语言就是将项目添加到列表中,然后使用str.join将它们连接到最后:

l = []
l.append('foo')
l.append('bar')
l.append('baz')

s = ''.join(l)

#3


34  

Don't.

不喜欢。

That is, for most cases you are better off generating the whole string in one go rather then appending to an existing string.

也就是说,对于大多数情况,最好一次生成整个字符串,而不是附加到现有的字符串。

For example, don't do: obj1.name + ":" + str(obj1.count)

例如,不要:obj1.name + ":" + str(obj1.count)

Instead: use "%s:%d" % (obj1.name, obj1.count)

相反:使用“%s:%d”% (obj1.name, obj1.count)

That will be easier to read and more efficient.

这将更容易阅读和更有效。

#4


26  

str1 = "Hello"
str2 = "World"
newstr = " ".join((str1, str2))

That joins str1 and str2 with a space as separators. You can also do "".join(str1, str2, ...). str.join() takes an iterable, so you'd have to put the strings in a list or a tuple.

将空间作为分隔符连接str1和str2。你也可以做"。加入(str1 str2,…)。string .join()具有可迭代性,因此必须将字符串放入列表或元组中。

That's about as efficient as it gets for a builtin method.

这就像构建方法一样有效。

#5


9  

it really depends on your application. If you're looping through hundreds of words and want to append them all into a list, .join() is better. But if you're putting together a long sentence, you're better off using +=.

这取决于你的申请。如果您正在遍历数百个单词并希望将它们全部添加到列表中,.join()会更好。但是如果你把一个长句子放在一起,你最好使用+=。

#6


8  

If you need to do many append operations to build a large string, you can use StringIO or cStringIO. The interface is like a file. ie: you write to append text to it.

如果需要执行许多追加操作来构建一个大字符串,可以使用StringIO或cStringIO。接口就像一个文件。ie:你给它加上文字。

If you're just appending two strings then just use +.

如果你只是附加两个字符串,那就用+。

#7


4  

a='foo'
b='baaz'

a.__add__(b)

out: 'foobaaz'

#8


3  

Basically, no difference. The only consistent trend is that Python seems to be getting slower with every version... :(

基本上,没有区别。唯一一致的趋势是Python似乎在每个版本中都变得越来越慢……:(


List

%%timeit
x = []
for i in range(100000000):  # xrange on Python 2.7
    x.append('a')
x = ''.join(x)

Python 2.7

Python 2.7

1 loop, best of 3: 7.34 s per loop

1圈,最好是3圈:7.34秒

Python 3.4

Python 3.4

1 loop, best of 3: 7.99 s per loop

1圈,最好是3圈:7.99秒/圈

Python 3.5

Python 3.5

1 loop, best of 3: 8.48 s per loop

1圈,最好是3圈:8.48秒

Python 3.6

Python 3.6

1 loop, best of 3: 9.93 s per loop

1圈,最好是3圈:9.93秒


String

%%timeit
x = ''
for i in range(100000000):  # xrange on Python 2.7
    x += 'a'

Python 2.7:

Python 2.7:

1 loop, best of 3: 7.41 s per loop

1圈,最好是3圈:7.41秒

Python 3.4

Python 3.4

1 loop, best of 3: 9.08 s per loop

1圈,最好是3圈:9.08秒/圈

Python 3.5

Python 3.5

1 loop, best of 3: 8.82 s per loop

1环,最好是3:8.82 s /环。

Python 3.6

Python 3.6

1 loop, best of 3: 9.24 s per loop

1圈,最好是3圈:9.24秒

#9


1  

append strings with __add__ function

使用__add__函数附加字符串

str = "Hello"
str2 = " World"
st = str.__add__(str2)
print(st)

Output

输出

Hello World

#1


468  

If you only have one reference to a string and you concatenate another string to the end, CPython now special cases this and tries to extend the string in place.

如果您只有一个对字符串的引用,并将另一个字符串连接到末尾,那么CPython就会有这种特殊情况,并尝试在适当的地方扩展该字符串。

The end result is that the operation is amortized O(n).

最终结果是操作被平摊为O(n)

e.g.

如。

s = ""
for i in range(n):
    s+=str(i)

used to be O(n^2), but now it is O(n).

曾经是O(n ^ 2),但现在它是O(n)。

From the source (bytesobject.c):

从源(bytesobject.c):

void
PyBytes_ConcatAndDel(register PyObject **pv, register PyObject *w)
{
    PyBytes_Concat(pv, w);
    Py_XDECREF(w);
}


/* The following function breaks the notion that strings are immutable:
   it changes the size of a string.  We get away with this only if there
   is only one module referencing the object.  You can also think of it
   as creating a new string object and destroying the old one, only
   more efficiently.  In any case, don't use this if the string may
   already be known to some other part of the code...
   Note that if there's not enough memory to resize the string, the original
   string object at *pv is deallocated, *pv is set to NULL, an "out of
   memory" exception is set, and -1 is returned.  Else (on success) 0 is
   returned, and the value in *pv may or may not be the same as on input.
   As always, an extra byte is allocated for a trailing \0 byte (newsize
   does *not* include that), and a trailing \0 byte is stored.
*/

int
_PyBytes_Resize(PyObject **pv, Py_ssize_t newsize)
{
    register PyObject *v;
    register PyBytesObject *sv;
    v = *pv;
    if (!PyBytes_Check(v) || Py_REFCNT(v) != 1 || newsize < 0) {
        *pv = 0;
        Py_DECREF(v);
        PyErr_BadInternalCall();
        return -1;
    }
    /* XXX UNREF/NEWREF interface should be more symmetrical */
    _Py_DEC_REFTOTAL;
    _Py_ForgetReference(v);
    *pv = (PyObject *)
        PyObject_REALLOC((char *)v, PyBytesObject_SIZE + newsize);
    if (*pv == NULL) {
        PyObject_Del(v);
        PyErr_NoMemory();
        return -1;
    }
    _Py_NewReference(*pv);
    sv = (PyBytesObject *) *pv;
    Py_SIZE(sv) = newsize;
    sv->ob_sval[newsize] = '\0';
    sv->ob_shash = -1;          /* invalidate cached hash value */
    return 0;
}

It's easy enough to verify empirically.

凭经验来验证是很容易的。

$ python -m timeit -s"s=''" "for i in xrange(10):s+='a'"
1000000 loops, best of 3: 1.85 usec per loop
$ python -m timeit -s"s=''" "for i in xrange(100):s+='a'"
10000 loops, best of 3: 16.8 usec per loop
$ python -m timeit -s"s=''" "for i in xrange(1000):s+='a'"
10000 loops, best of 3: 158 usec per loop
$ python -m timeit -s"s=''" "for i in xrange(10000):s+='a'"
1000 loops, best of 3: 1.71 msec per loop
$ python -m timeit -s"s=''" "for i in xrange(100000):s+='a'"
10 loops, best of 3: 14.6 msec per loop
$ python -m timeit -s"s=''" "for i in xrange(1000000):s+='a'"
10 loops, best of 3: 173 msec per loop

It's important however to note that this optimisation isn't part of the Python spec. It's only in the cPython implementation as far as I know. The same empirical testing on pypy or jython for example might show the older O(n**2) performance .

但是,重要的是要注意,这种优化不是Python规范的一部分,据我所知,它只是在cPython实现中。例如,在pypy或jython上进行的相同的经验测试可能会显示旧的O(n**2)性能。

$ pypy -m timeit -s"s=''" "for i in xrange(10):s+='a'"
10000 loops, best of 3: 90.8 usec per loop
$ pypy -m timeit -s"s=''" "for i in xrange(100):s+='a'"
1000 loops, best of 3: 896 usec per loop
$ pypy -m timeit -s"s=''" "for i in xrange(1000):s+='a'"
100 loops, best of 3: 9.03 msec per loop
$ pypy -m timeit -s"s=''" "for i in xrange(10000):s+='a'"
10 loops, best of 3: 89.5 msec per loop

So far so good, but then,

到目前为止还不错,但是,

$ pypy -m timeit -s"s=''" "for i in xrange(100000):s+='a'"
10 loops, best of 3: 12.8 sec per loop

ouch even worse than quadratic. So pypy is doing something that works well with short strings, but performs poorly for larger strings.

哎哟,比二次方程还糟。pypy在短弦上工作得很好,但在大弦上表现很差。

#2


231  

Don't prematurely optimize. If you have no reason to believe there's a speed bottleneck caused by string concatenations then just stick with + and +=:

不要过早的优化。如果您没有理由相信字符串连接会导致速度瓶颈,那么只需使用+和+=:

s  = 'foo'
s += 'bar'
s += 'baz'

That said, if you're aiming for something like Java's StringBuilder, the canonical Python idiom is to add items to a list and then use str.join to concatenate them all at the end:

也就是说,如果你的目标是像Java的StringBuilder,典型的Python语言就是将项目添加到列表中,然后使用str.join将它们连接到最后:

l = []
l.append('foo')
l.append('bar')
l.append('baz')

s = ''.join(l)

#3


34  

Don't.

不喜欢。

That is, for most cases you are better off generating the whole string in one go rather then appending to an existing string.

也就是说,对于大多数情况,最好一次生成整个字符串,而不是附加到现有的字符串。

For example, don't do: obj1.name + ":" + str(obj1.count)

例如,不要:obj1.name + ":" + str(obj1.count)

Instead: use "%s:%d" % (obj1.name, obj1.count)

相反:使用“%s:%d”% (obj1.name, obj1.count)

That will be easier to read and more efficient.

这将更容易阅读和更有效。

#4


26  

str1 = "Hello"
str2 = "World"
newstr = " ".join((str1, str2))

That joins str1 and str2 with a space as separators. You can also do "".join(str1, str2, ...). str.join() takes an iterable, so you'd have to put the strings in a list or a tuple.

将空间作为分隔符连接str1和str2。你也可以做"。加入(str1 str2,…)。string .join()具有可迭代性,因此必须将字符串放入列表或元组中。

That's about as efficient as it gets for a builtin method.

这就像构建方法一样有效。

#5


9  

it really depends on your application. If you're looping through hundreds of words and want to append them all into a list, .join() is better. But if you're putting together a long sentence, you're better off using +=.

这取决于你的申请。如果您正在遍历数百个单词并希望将它们全部添加到列表中,.join()会更好。但是如果你把一个长句子放在一起,你最好使用+=。

#6


8  

If you need to do many append operations to build a large string, you can use StringIO or cStringIO. The interface is like a file. ie: you write to append text to it.

如果需要执行许多追加操作来构建一个大字符串,可以使用StringIO或cStringIO。接口就像一个文件。ie:你给它加上文字。

If you're just appending two strings then just use +.

如果你只是附加两个字符串,那就用+。

#7


4  

a='foo'
b='baaz'

a.__add__(b)

out: 'foobaaz'

#8


3  

Basically, no difference. The only consistent trend is that Python seems to be getting slower with every version... :(

基本上,没有区别。唯一一致的趋势是Python似乎在每个版本中都变得越来越慢……:(


List

%%timeit
x = []
for i in range(100000000):  # xrange on Python 2.7
    x.append('a')
x = ''.join(x)

Python 2.7

Python 2.7

1 loop, best of 3: 7.34 s per loop

1圈,最好是3圈:7.34秒

Python 3.4

Python 3.4

1 loop, best of 3: 7.99 s per loop

1圈,最好是3圈:7.99秒/圈

Python 3.5

Python 3.5

1 loop, best of 3: 8.48 s per loop

1圈,最好是3圈:8.48秒

Python 3.6

Python 3.6

1 loop, best of 3: 9.93 s per loop

1圈,最好是3圈:9.93秒


String

%%timeit
x = ''
for i in range(100000000):  # xrange on Python 2.7
    x += 'a'

Python 2.7:

Python 2.7:

1 loop, best of 3: 7.41 s per loop

1圈,最好是3圈:7.41秒

Python 3.4

Python 3.4

1 loop, best of 3: 9.08 s per loop

1圈,最好是3圈:9.08秒/圈

Python 3.5

Python 3.5

1 loop, best of 3: 8.82 s per loop

1环,最好是3:8.82 s /环。

Python 3.6

Python 3.6

1 loop, best of 3: 9.24 s per loop

1圈,最好是3圈:9.24秒

#9


1  

append strings with __add__ function

使用__add__函数附加字符串

str = "Hello"
str2 = " World"
st = str.__add__(str2)
print(st)

Output

输出

Hello World