在Cython中使用C创建的一组列表比纯Python慢​​得多 - 为什么?

时间:2021-11-17 21:20:44

In this example, I show two different methods for creating a list of strings using Cython. One uses an array of char pointers (and the strcpy C function) and the other by simply appending elements to a list.

在这个例子中,我展示了使用Cython创建字符串列表的两种不同方法。一个使用char指针数组(和strcpy C函数),另一个只是将元素附加到列表中。

I then pass each of these lists into the set function and see that performance is drastically different.

然后我将这些列表中的每一个传递给set函数,并看到性能完全不同。

Question - What can I do to create the list using character pointers to have equal performance?

问题 - 如何使用字符指针创建列表以获得相同的性能?

A simple function to create lists in Cython

一个在Cython中创建列表的简单函数

from libc.string cimport strcpy

def make_lists():
    cdef:
        char c_list[100000][3]
        Py_ssize_t i
        list py_list = []

    for i in range(100000):
        strcpy(c_list[i], 'AB')
        c_list[i][2] = b'\0'
        py_list.append(b'AB')

    return c_list, py_list

Here, c_list is just an array of 3-length characters. Cython will return this object as a Python list. py_list is just a normal Python list. We are filling both lists with just a single sequence of bytes, 'AB'.

这里,c_list只是一个3长字符的数组。 Cython将此对象作为Python列表返回。 py_list只是一个普通的Python列表。我们只用一个字节序列'AB'填充两个列表。

Create the lists

c_list, py_list = make_lists()

Print out some of the contents

>>> c_list[:10]
[b'AB', b'AB', b'AB', b'AB', b'AB', b'AB', b'AB', b'AB', b'AB', b'AB']

Show both lists are equal

>>> c_list == py_list
True

Time operations - this is insane to me! 3x difference

%timeit set(c_list)
2.85 ms ± 115 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit set(py_list)
1.02 ms ± 26 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Unicode and pure python

Interestingly, the performance difference vanishes if I decode each value to unicode, though it is slower than the original set(py_list). If I create a unicode list in pure Python then I am back to the original performance.

有趣的是,如果我将每个值解码为unicode,性能差异就会消失,尽管它比原始集合(py_list)慢。如果我用纯Python创建一个unicode列表,那么我将恢复原始性能。

c_list_unicode = [v.decode() for v in c_list]
py_list_unicode = [v.decode() for v in py_list]
py_list_py = ['AB' for _ in range(len(py_list))]

%timeit set(c_list_unicode)
1.63 ms ± 56.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit set(py_list_unicode)
1.7 ms ± 35.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit set(py_list_py)
987 µs ± 45.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Even simpler example

def make_lists2():
    cdef:
        char *c_list[100000]
        Py_ssize_t i
        list py_list_slow = []
        list py_list_fast = []

    for i in range(100000):
        c_list[i] = 'AB'
        py_list_slow.append(c_list[i])
        py_list_fast.append(b'AB')

    return c_list, py_list_slow, py_list_fast

Timings

c_list2, py_list_slow, py_list_fast = make_lists2()

%timeit set(c_list2)
3.01 ms ± 137 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit set(py_list_slow)
3.05 ms ± 168 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit set(py_list_fast)
1.08 ms ± 38.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

edit

Possible solution

I found the function PyUnicode_InternFromString in the unicode Python C API and am getting performance on par with regular python lists. This 'interns' the string - not sure what that means

我在unicode Python C API中找到了函数PyUnicode_InternFromString,并且获得了与常规python列表相同的性能。这个'实习'字符串 - 不知道这意味着什么

1 个解决方案

#1


3  

Your c_list is a list of 100000 distinct bytestrings with the same contents. Cython has to convert each char[3] to a bytestring separately, and it doesn't bother to do any object deduplication.

您的c_list是包含相同内容的100000个不同字节串的列表。 Cython必须将每个char [3]分别转换为字节字符串,并且不需要进行任何对象重复数据删除。

Your py_list is a list of the same bytestring object 100000 times. Every py_list.append(b'AB') appends the same object to py_list; without the trip through a C array, Cython never needs to copy the bytestring.

您的py_list是相同bytestring对象的列表100000次。每个py_list.append(b'AB')都将同一个对象附加到py_list;如果没有通过C数组的行程,Cython永远不需要复制bytestring。

set(c_list) is slower than set(py_list) because set(c_list) has to actually perform string comparison, while set(py_list) gets to skip that with an object identity check.

set(c_list)比set(py_list)慢,因为set(c_list)必须实际执行字符串比较,而set(py_list)通过对象标识检查跳过它。

#1


3  

Your c_list is a list of 100000 distinct bytestrings with the same contents. Cython has to convert each char[3] to a bytestring separately, and it doesn't bother to do any object deduplication.

您的c_list是包含相同内容的100000个不同字节串的列表。 Cython必须将每个char [3]分别转换为字节字符串,并且不需要进行任何对象重复数据删除。

Your py_list is a list of the same bytestring object 100000 times. Every py_list.append(b'AB') appends the same object to py_list; without the trip through a C array, Cython never needs to copy the bytestring.

您的py_list是相同bytestring对象的列表100000次。每个py_list.append(b'AB')都将同一个对象附加到py_list;如果没有通过C数组的行程,Cython永远不需要复制bytestring。

set(c_list) is slower than set(py_list) because set(c_list) has to actually perform string comparison, while set(py_list) gets to skip that with an object identity check.

set(c_list)比set(py_list)慢,因为set(c_list)必须实际执行字符串比较,而set(py_list)通过对象标识检查跳过它。