Cython:类型化的内存视图是键入numpy数组的现代方法吗?

时间:2021-12-10 01:48:48

Let's say I'd like to pass a numpy array to a cdef function:

假设我想将一个numpy数组传递给cdef函数:

cdef double mysum(double[:] arr):
    cdef int n = len(arr)
    cdef double result = 0

    for i in range(n):
        result = result + arr[i]

    return result

Is this the modern way to handle typing numpy arrays? Compare with this question: cython / numpy type of an array

这是处理键入numpy数组的现代方法吗?与这个问题相比:cython / numpy类型的数组

What if I want to do the following:

如果我想要执行以下操作该怎么办?

cdef double[:] mydifference(int a, int b):
    cdef double[:] arr_a = np.arange(a)
    cdef double[:] arr_b = np.arange(b)

    return arr_a - arr_b

This will return an error because - is not defined for memoryviews. So, should that case have been handled as follows?

这将返回错误,因为 - 没有为memoryviews定义。那么,该案件是否应如下处理?

cdef double[:] mydifference(int a, int b):
    arr_a = np.arange(a)
    arr_b = np.arange(b)

    return arr_a - arr_b

1 个解决方案

#1


36  

I will quote from the docs the docs

我将引用文档中的文档

Memoryviews are similar to the current NumPy array buffer support (np.ndarray[np.float64_t, ndim=2]), but they have more features and cleaner syntax.

Memoryview类似于当前的NumPy数组缓冲区支持(np.ndarray [np.float64_t,ndim = 2]),但它们具有更多功能和更清晰的语法。

This indicates that the developers of Cython consider memory views to be the modern way.

这表明Cython的开发人员认为内存视图是现代方式。

Memory views offer some big advantages over the np.ndarray notation primarily in elegance and interoperability, however they are not superior in performance.

内存视图提供了一些优于np.ndarray符号的优点,主要是优雅和互操作性,但它们的性能并不优越。

Performance:

First it should be noted that boundscheck sometimes fails to work with memory views resulting in artificially fast figures for memoryviews with boundscheck=True (i.e. you get fast, unsafe indexing), if you're relying on boundscheck to catch bugs this could be a nasty surprise.

首先应该注意的是,boundscheck有时无法使用内存视图导致内存视图的人为快速数据,其中boundscheck = True(即你获得快速,不安全的索引),如果你依靠boundscheck来捕获bug,这可能是一个令人讨厌的惊喜。

For the most part once compiler optimizations have been applied, memory views and numpy array notation are equal in performance, often precisely so. When there is a difference it is normally no more than 10-30%.

在大多数情况下,一旦应用了编译器优化,内存视图和numpy数组表示法在性能上是相同的,通常是这样。当存在差异时,通常不超过10-30%。

Performance benchmark

The number is the time in seconds to perform 100,000,000 operations. Smaller is faster.

该数字是执行100,000,000次操作的时间(以秒为单位)。更小更快。

ACCESS+ASSIGNMENT on small array (10000 elements, 10000 times)
Results for `uint8`
1) memory view: 0.0415 +/- 0.0017
2) np.ndarray : 0.0531 +/- 0.0012
3) pointer    : 0.0333 +/- 0.0017

Results for `uint16`
1) memory view: 0.0479 +/- 0.0032
2) np.ndarray : 0.0480 +/- 0.0034
3) pointer    : 0.0329 +/- 0.0008

Results for `uint32`
1) memory view: 0.0499 +/- 0.0021
2) np.ndarray : 0.0413 +/- 0.0005
3) pointer    : 0.0332 +/- 0.0010

Results for `uint64`
1) memory view: 0.0489 +/- 0.0019
2) np.ndarray : 0.0417 +/- 0.0010
3) pointer    : 0.0353 +/- 0.0017

Results for `float32`
1) memory view: 0.0398 +/- 0.0027
2) np.ndarray : 0.0418 +/- 0.0019
3) pointer    : 0.0330 +/- 0.0006

Results for `float64`
1) memory view: 0.0439 +/- 0.0037
2) np.ndarray : 0.0422 +/- 0.0013
3) pointer    : 0.0353 +/- 0.0013

ACCESS PERFORMANCE (100,000,000 element array):
Results for `uint8`
1) memory view: 0.0576 +/- 0.0006
2) np.ndarray : 0.0570 +/- 0.0009
3) pointer    : 0.0061 +/- 0.0004

Results for `uint16`
1) memory view: 0.0806 +/- 0.0002
2) np.ndarray : 0.0882 +/- 0.0005
3) pointer    : 0.0121 +/- 0.0003

Results for `uint32`
1) memory view: 0.0572 +/- 0.0016
2) np.ndarray : 0.0571 +/- 0.0021
3) pointer    : 0.0248 +/- 0.0008

Results for `uint64`
1) memory view: 0.0618 +/- 0.0007
2) np.ndarray : 0.0621 +/- 0.0014
3) pointer    : 0.0481 +/- 0.0006

Results for `float32`
1) memory view: 0.0945 +/- 0.0013
2) np.ndarray : 0.0947 +/- 0.0018
3) pointer    : 0.0942 +/- 0.0020

Results for `float64`
1) memory view: 0.0981 +/- 0.0026
2) np.ndarray : 0.0982 +/- 0.0026
3) pointer    : 0.0968 +/- 0.0016

ASSIGNMENT PERFORMANCE (100,000,000 element array):
Results for `uint8`
1) memory view: 0.0341 +/- 0.0010
2) np.ndarray : 0.0476 +/- 0.0007
3) pointer    : 0.0402 +/- 0.0001

Results for `uint16`
1) memory view: 0.0368 +/- 0.0020
2) np.ndarray : 0.0368 +/- 0.0019
3) pointer    : 0.0279 +/- 0.0009

Results for `uint32`
1) memory view: 0.0429 +/- 0.0022
2) np.ndarray : 0.0427 +/- 0.0005
3) pointer    : 0.0418 +/- 0.0007

Results for `uint64`
1) memory view: 0.0833 +/- 0.0004
2) np.ndarray : 0.0835 +/- 0.0011
3) pointer    : 0.0832 +/- 0.0003

Results for `float32`
1) memory view: 0.0648 +/- 0.0061
2) np.ndarray : 0.0644 +/- 0.0044
3) pointer    : 0.0639 +/- 0.0005

Results for `float64`
1) memory view: 0.0854 +/- 0.0056
2) np.ndarray : 0.0849 +/- 0.0043
3) pointer    : 0.0847 +/- 0.0056

Benchmark Code (Shown only for access+assignment)

# cython: boundscheck=False
# cython: wraparound=False
# cython: nonecheck=False
import numpy as np
cimport numpy as np
cimport cython

# Change these as desired.
data_type = np.uint64
ctypedef np.uint64_t data_type_t

cpdef test_memory_view(data_type_t [:] view):
    cdef Py_ssize_t i, j, n = view.shape[0]

    for j in range(0, n):
        for i in range(0, n):
            view[i] = view[j]

cpdef test_ndarray(np.ndarray[data_type_t, ndim=1] view):
    cdef Py_ssize_t i, j, n = view.shape[0]

    for j in range(0, n):
        for i in range(0, n):
            view[i] = view[j]

cpdef test_pointer(data_type_t [:] view):
    cdef Py_ssize_t i, j, n = view.shape[0]
    cdef data_type_t * data_ptr = &view[0]

    for j in range(0, n):
        for i in range(0, n):
            (data_ptr + i)[0] = (data_ptr + j)[0]

def run_test():
    import time
    from statistics import stdev, mean
    n = 10000
    repeats = 100
    a = np.arange(0, n,  dtype=data_type)
    funcs = [('1) memory view', test_memory_view),
        ('2) np.ndarray', test_ndarray),
        ('3) pointer', test_pointer)]

    results = {label: [] for label, func in funcs}
    for r in range(0, repeats):
        for label, func in funcs:
            start=time.time()
            func(a)
            results[label].append(time.time() - start)

    print('Results for `{}`'.format(data_type.__name__))
    for label, times in sorted(results.items()):
        print('{: <14}: {:.4f} +/- {:.4f}'.format(label, mean(times), stdev(times)))

These benchmarks indicate that on the whole there is not much difference in performance. Sometimes the np.ndarray notation is a little faster, and sometimes vice-verca.

这些基准表明总体而言,性能差异不大。有时候,np.ndarray符号会快一点,有时甚至是副verca。

One thing to watch out for with benchmarks is that when the code is made a little bit more complicated or 'realistic' the difference suddenly vanishes, as if the compiler loses confidence to apply some very clever optimization. This can be seen with the performance of floats where there is no difference whatsoever presumably as some fancy integer optimizations can't be used.

需要注意的一点是,当代码变得更复杂或“现实”时,差异会突然消失,好像编译器失去了应用一些非常聪明的优化的信心。这可以通过浮点数的性能来看出,其中没有任何差异可能因为某些奇特的整数优化无法使用。

Ease of use

Memory views offer significant advantages, for example you can use a memory view on numpy array, CPython array, cython array, c array and more, both present and future. There is also the simple parallel syntax for casting anything to a memory view:

内存视图提供了显着的优势,例如,您可以在numpy数组,CPython数组,cython数组,c数组等上使用内存视图,无论是现在还是将来。还有一个简单的并行语法,用于将任何内容转换为内存视图:

cdef double [:, :] data_view = <double[:256, :256]>data

Memory views are great in this regard, because if you type a function as taking a memory view then it can take any of those things. This means you can write a module that doesn't have a dependency on numpy, but which can still take numpy arrays.

内存视图在这方面很棒,因为如果你输入一个函数作为内存视图,那么它可以采取任何这些东西。这意味着您可以编写一个不依赖于numpy但仍可以使用numpy数组的模块。

On the other hand, np.ndarray notation results in something that is still a numpy array and you can call all the numpy array methods on it. It's not a big deal to have both a numpy array and a view on the array though:

另一方面,np.ndarray表示法导致仍然是一个numpy数组的东西,你可以调用它上面的所有numpy数组方法。尽管如此,在数组上同时拥有numpy数组和视图并不是什么大问题:

def dostuff(arr):
    cdef double [:] arr_view = arr
    # Now you can use 'arr' if you want array functions,
    # and arr_view if you want fast indexing

Having both the array and the array view works fine in practise and I quite like the style, as it makes a clear distinction between python-level methods and c-level methods.

让数组和数组视图在实践中都很好用,我非常喜欢这个样式,因为它明确区分了python级方法和c级方法。

Conclusion

Performance is very nearly equal and there is certainly not enough difference for that to be a deciding factor.

性能几乎相等,并且肯定没有足够的差异作为决定因素。

The numpy array notation comes closer to the ideal of accelerating python code without changing it much, as you can continue to use the same variable, while gaining full-speed array indexing.

numpy数组符号更接近加速python代码的理想而不会更改它,因为你可以继续使用相同的变量,同时获得全速数组索引。

On the other hand, the memory view notation probably is the future. If you like the elegance of it, and use different kinds of data containers than just numpy arrays, there is very good reason for using memory views for consistency's sake.

另一方面,内存视图符号可能是未来。如果你喜欢它的优雅,并使用不同种类的数据容器而不仅仅是numpy数组,那么为了保持一致性,有充分的理由使用内存视图。

#1


36  

I will quote from the docs the docs

我将引用文档中的文档

Memoryviews are similar to the current NumPy array buffer support (np.ndarray[np.float64_t, ndim=2]), but they have more features and cleaner syntax.

Memoryview类似于当前的NumPy数组缓冲区支持(np.ndarray [np.float64_t,ndim = 2]),但它们具有更多功能和更清晰的语法。

This indicates that the developers of Cython consider memory views to be the modern way.

这表明Cython的开发人员认为内存视图是现代方式。

Memory views offer some big advantages over the np.ndarray notation primarily in elegance and interoperability, however they are not superior in performance.

内存视图提供了一些优于np.ndarray符号的优点,主要是优雅和互操作性,但它们的性能并不优越。

Performance:

First it should be noted that boundscheck sometimes fails to work with memory views resulting in artificially fast figures for memoryviews with boundscheck=True (i.e. you get fast, unsafe indexing), if you're relying on boundscheck to catch bugs this could be a nasty surprise.

首先应该注意的是,boundscheck有时无法使用内存视图导致内存视图的人为快速数据,其中boundscheck = True(即你获得快速,不安全的索引),如果你依靠boundscheck来捕获bug,这可能是一个令人讨厌的惊喜。

For the most part once compiler optimizations have been applied, memory views and numpy array notation are equal in performance, often precisely so. When there is a difference it is normally no more than 10-30%.

在大多数情况下,一旦应用了编译器优化,内存视图和numpy数组表示法在性能上是相同的,通常是这样。当存在差异时,通常不超过10-30%。

Performance benchmark

The number is the time in seconds to perform 100,000,000 operations. Smaller is faster.

该数字是执行100,000,000次操作的时间(以秒为单位)。更小更快。

ACCESS+ASSIGNMENT on small array (10000 elements, 10000 times)
Results for `uint8`
1) memory view: 0.0415 +/- 0.0017
2) np.ndarray : 0.0531 +/- 0.0012
3) pointer    : 0.0333 +/- 0.0017

Results for `uint16`
1) memory view: 0.0479 +/- 0.0032
2) np.ndarray : 0.0480 +/- 0.0034
3) pointer    : 0.0329 +/- 0.0008

Results for `uint32`
1) memory view: 0.0499 +/- 0.0021
2) np.ndarray : 0.0413 +/- 0.0005
3) pointer    : 0.0332 +/- 0.0010

Results for `uint64`
1) memory view: 0.0489 +/- 0.0019
2) np.ndarray : 0.0417 +/- 0.0010
3) pointer    : 0.0353 +/- 0.0017

Results for `float32`
1) memory view: 0.0398 +/- 0.0027
2) np.ndarray : 0.0418 +/- 0.0019
3) pointer    : 0.0330 +/- 0.0006

Results for `float64`
1) memory view: 0.0439 +/- 0.0037
2) np.ndarray : 0.0422 +/- 0.0013
3) pointer    : 0.0353 +/- 0.0013

ACCESS PERFORMANCE (100,000,000 element array):
Results for `uint8`
1) memory view: 0.0576 +/- 0.0006
2) np.ndarray : 0.0570 +/- 0.0009
3) pointer    : 0.0061 +/- 0.0004

Results for `uint16`
1) memory view: 0.0806 +/- 0.0002
2) np.ndarray : 0.0882 +/- 0.0005
3) pointer    : 0.0121 +/- 0.0003

Results for `uint32`
1) memory view: 0.0572 +/- 0.0016
2) np.ndarray : 0.0571 +/- 0.0021
3) pointer    : 0.0248 +/- 0.0008

Results for `uint64`
1) memory view: 0.0618 +/- 0.0007
2) np.ndarray : 0.0621 +/- 0.0014
3) pointer    : 0.0481 +/- 0.0006

Results for `float32`
1) memory view: 0.0945 +/- 0.0013
2) np.ndarray : 0.0947 +/- 0.0018
3) pointer    : 0.0942 +/- 0.0020

Results for `float64`
1) memory view: 0.0981 +/- 0.0026
2) np.ndarray : 0.0982 +/- 0.0026
3) pointer    : 0.0968 +/- 0.0016

ASSIGNMENT PERFORMANCE (100,000,000 element array):
Results for `uint8`
1) memory view: 0.0341 +/- 0.0010
2) np.ndarray : 0.0476 +/- 0.0007
3) pointer    : 0.0402 +/- 0.0001

Results for `uint16`
1) memory view: 0.0368 +/- 0.0020
2) np.ndarray : 0.0368 +/- 0.0019
3) pointer    : 0.0279 +/- 0.0009

Results for `uint32`
1) memory view: 0.0429 +/- 0.0022
2) np.ndarray : 0.0427 +/- 0.0005
3) pointer    : 0.0418 +/- 0.0007

Results for `uint64`
1) memory view: 0.0833 +/- 0.0004
2) np.ndarray : 0.0835 +/- 0.0011
3) pointer    : 0.0832 +/- 0.0003

Results for `float32`
1) memory view: 0.0648 +/- 0.0061
2) np.ndarray : 0.0644 +/- 0.0044
3) pointer    : 0.0639 +/- 0.0005

Results for `float64`
1) memory view: 0.0854 +/- 0.0056
2) np.ndarray : 0.0849 +/- 0.0043
3) pointer    : 0.0847 +/- 0.0056

Benchmark Code (Shown only for access+assignment)

# cython: boundscheck=False
# cython: wraparound=False
# cython: nonecheck=False
import numpy as np
cimport numpy as np
cimport cython

# Change these as desired.
data_type = np.uint64
ctypedef np.uint64_t data_type_t

cpdef test_memory_view(data_type_t [:] view):
    cdef Py_ssize_t i, j, n = view.shape[0]

    for j in range(0, n):
        for i in range(0, n):
            view[i] = view[j]

cpdef test_ndarray(np.ndarray[data_type_t, ndim=1] view):
    cdef Py_ssize_t i, j, n = view.shape[0]

    for j in range(0, n):
        for i in range(0, n):
            view[i] = view[j]

cpdef test_pointer(data_type_t [:] view):
    cdef Py_ssize_t i, j, n = view.shape[0]
    cdef data_type_t * data_ptr = &view[0]

    for j in range(0, n):
        for i in range(0, n):
            (data_ptr + i)[0] = (data_ptr + j)[0]

def run_test():
    import time
    from statistics import stdev, mean
    n = 10000
    repeats = 100
    a = np.arange(0, n,  dtype=data_type)
    funcs = [('1) memory view', test_memory_view),
        ('2) np.ndarray', test_ndarray),
        ('3) pointer', test_pointer)]

    results = {label: [] for label, func in funcs}
    for r in range(0, repeats):
        for label, func in funcs:
            start=time.time()
            func(a)
            results[label].append(time.time() - start)

    print('Results for `{}`'.format(data_type.__name__))
    for label, times in sorted(results.items()):
        print('{: <14}: {:.4f} +/- {:.4f}'.format(label, mean(times), stdev(times)))

These benchmarks indicate that on the whole there is not much difference in performance. Sometimes the np.ndarray notation is a little faster, and sometimes vice-verca.

这些基准表明总体而言,性能差异不大。有时候,np.ndarray符号会快一点,有时甚至是副verca。

One thing to watch out for with benchmarks is that when the code is made a little bit more complicated or 'realistic' the difference suddenly vanishes, as if the compiler loses confidence to apply some very clever optimization. This can be seen with the performance of floats where there is no difference whatsoever presumably as some fancy integer optimizations can't be used.

需要注意的一点是,当代码变得更复杂或“现实”时,差异会突然消失,好像编译器失去了应用一些非常聪明的优化的信心。这可以通过浮点数的性能来看出,其中没有任何差异可能因为某些奇特的整数优化无法使用。

Ease of use

Memory views offer significant advantages, for example you can use a memory view on numpy array, CPython array, cython array, c array and more, both present and future. There is also the simple parallel syntax for casting anything to a memory view:

内存视图提供了显着的优势,例如,您可以在numpy数组,CPython数组,cython数组,c数组等上使用内存视图,无论是现在还是将来。还有一个简单的并行语法,用于将任何内容转换为内存视图:

cdef double [:, :] data_view = <double[:256, :256]>data

Memory views are great in this regard, because if you type a function as taking a memory view then it can take any of those things. This means you can write a module that doesn't have a dependency on numpy, but which can still take numpy arrays.

内存视图在这方面很棒,因为如果你输入一个函数作为内存视图,那么它可以采取任何这些东西。这意味着您可以编写一个不依赖于numpy但仍可以使用numpy数组的模块。

On the other hand, np.ndarray notation results in something that is still a numpy array and you can call all the numpy array methods on it. It's not a big deal to have both a numpy array and a view on the array though:

另一方面,np.ndarray表示法导致仍然是一个numpy数组的东西,你可以调用它上面的所有numpy数组方法。尽管如此,在数组上同时拥有numpy数组和视图并不是什么大问题:

def dostuff(arr):
    cdef double [:] arr_view = arr
    # Now you can use 'arr' if you want array functions,
    # and arr_view if you want fast indexing

Having both the array and the array view works fine in practise and I quite like the style, as it makes a clear distinction between python-level methods and c-level methods.

让数组和数组视图在实践中都很好用,我非常喜欢这个样式,因为它明确区分了python级方法和c级方法。

Conclusion

Performance is very nearly equal and there is certainly not enough difference for that to be a deciding factor.

性能几乎相等,并且肯定没有足够的差异作为决定因素。

The numpy array notation comes closer to the ideal of accelerating python code without changing it much, as you can continue to use the same variable, while gaining full-speed array indexing.

numpy数组符号更接近加速python代码的理想而不会更改它,因为你可以继续使用相同的变量,同时获得全速数组索引。

On the other hand, the memory view notation probably is the future. If you like the elegance of it, and use different kinds of data containers than just numpy arrays, there is very good reason for using memory views for consistency's sake.

另一方面,内存视图符号可能是未来。如果你喜欢它的优雅,并使用不同种类的数据容器而不仅仅是numpy数组,那么为了保持一致性,有充分的理由使用内存视图。