将C / C ++向量快速转换为Numpy数组

时间:2022-02-13 03:33:43

I'm using SWIG to glue together some C++ code to Python (2.6), and part of that glue includes a piece of code that converts large fields of data (millions of values) from the C++ side to a Numpy array. The best method I can come up with implements an iterator for the class and then provides a Python method:

我正在使用SWIG将一些C ++代码粘合到Python(2.6)上,并且该粘合剂的一部分包括一段代码,它将大数据字段(数百万个值)从C ++端转换为Numpy数组。我能想出的最好的方法是为类实现一个迭代器,然后提供一个Python方法:

def __array__(self, dtype=float):
    return np.fromiter(self, dtype, self.size())

The problem is that each iterator next call is very costly, since it has to go through about three or four SWIG wrappers. It takes far too long. I can guarantee that the C++ data are stored contiguously (since they live in a std::vector), and it just feels like Numpy should be able to take a pointer to the beginning of that data alongside the number of values it contains, and read it directly.

问题是每个迭代器的下一次调用都非常昂贵,因为它必须通过大约三到四个SWIG包装器。这需要太长时间。我可以保证C ++数据是连续存储的(因为它们存在于std :: vector中),只是感觉Numpy应该能够指向该数据的开头以及它包含的值的数量,并且直接阅读。

Is there a way to pass a pointer to internal_data_[0] and the value internal_data_.size() to numpy so that it can directly access or copy the data without all the Python overhead?

有没有办法将指向internal_data_ [0]和值internal_data_.size()的指针传递给numpy,以便它可以直接访问或复制数据而不需要所有的Python开销?

4 个解决方案

#1


2  

You will want to define __array_interface__() instead. This will let you pass back the pointer and the shape information directly.

您将需要定义__array_interface __()。这将让您直接传回指针和形状信息。

#2


1  

Maybe it would be possible to use f2py instead of swig. Despite its name, it is capable of interfacing python with C as well as Fortran. See http://www.scipy.org/Cookbook/f2py_and_NumPy

也许有可能使用f2py而不是swig。尽管它的名字,它能够连接python与C以及Fortran。见http://www.scipy.org/Cookbook/f2py_and_NumPy

The advantage is that it handles the conversion to numpy arrays automatically.

优点是它自动处理到numpy数组的转换。

Two caveats: if you don't already know Fortran, you may find f2py a bit strange; and I don't know how well it works with C++.

两个警告:如果你还不知道Fortran,你可能会发现f2py有点奇怪;我不知道它与C ++有多好用。

#3


0  

If you wrap your vector in an object that implements Pythons Buffer Interface, you can pass that to the numpy array for initialization (see docs, third argument). I would bet that this initialization is much faster, since it can just use memcpy to copy the data.

如果将向量包装在实现Pythons Buffer Interface的对象中,则可以将其传递给numpy数组进行初始化(请参阅docs,第三个参数)。我敢打赌,这种初始化要快得多,因为它只能使用memcpy来复制数据。

#4


0  

So it looks like the only real solution is to base something off pybuffer.i that can copy from C++ into an existing buffer. If you add this to a SWIG include file:

因此,看起来唯一真正的解决方案是基于pybuffer.i,可以从C ++复制到现有缓冲区。如果将其添加到SWIG包含文件:

%insert("python") %{
import numpy as np
%}

/*! Templated function to copy contents of a container to an allocated memory
 * buffer
 */
%inline %{
//==== ADDED BY numpy.i
#include <algorithm>

template < typename Container_T >
void copy_to_buffer(
        const Container_T& field,
        typename Container_T::value_type* buffer,
        typename Container_T::size_type length
        )
{
//    ValidateUserInput( length == field.size(),
//            "Destination buffer is the wrong size" );
    // put your own assertion here or BAD THINGS CAN HAPPEN

    if (length == field.size()) {
        std::copy( field.begin(), field.end(), buffer );
    }
}
//====

%}

%define TYPEMAP_COPY_TO_BUFFER(CLASS...)
%typemap(in) (CLASS::value_type* buffer, CLASS::size_type length)
(int res = 0, Py_ssize_t size_ = 0, void *buffer_ = 0) {

    res = PyObject_AsWriteBuffer($input, &buffer_, &size_);
    if ( res < 0 ) {
        PyErr_Clear();
        %argument_fail(res, "(CLASS::value_type*, CLASS::size_type length)",
                $symname, $argnum);
    }
    $1 = ($1_ltype) buffer_;
    $2 = ($2_ltype) (size_/sizeof($*1_type));
}
%enddef


%define ADD_NUMPY_ARRAY_INTERFACE(PYVALUE, PYCLASS, CLASS...)

TYPEMAP_COPY_TO_BUFFER(CLASS)

%template(_copy_to_buffer_ ## PYCLASS) copy_to_buffer< CLASS >;

%extend CLASS {
%insert("python") %{
def __array__(self):
    """Enable access to this data as a numpy array"""
    a = np.ndarray( shape=( len(self), ), dtype=PYVALUE )
    _copy_to_buffer_ ## PYCLASS(self, a)
    return a
%}
}

%enddef

then you can make a container "Numpy"-able with

然后你可以制作一个容器“Numpy” - 用

%template(DumbVectorFloat) DumbVector<double>;
ADD_NUMPY_ARRAY_INTERFACE(float, DumbVectorFloat, DumbVector<double>);

Then in Python, just do:

然后在Python中,只需:

# dvf is an instance of DumbVectorFloat
import numpy as np
my_numpy_array = np.asarray( dvf )

This has only the overhead of a single Python <--> C++ translation call, not the N that would result from a typical length-N array.

这只需要单个Python < - > C ++转换调用的开销,而不是典型的长度为N的数组所产生的N.

A slightly more complete version of this code is part of my PyTRT project at github.

这个代码的稍微完整版本是我在github上的PyTRT项目的一部分。

#1


2  

You will want to define __array_interface__() instead. This will let you pass back the pointer and the shape information directly.

您将需要定义__array_interface __()。这将让您直接传回指针和形状信息。

#2


1  

Maybe it would be possible to use f2py instead of swig. Despite its name, it is capable of interfacing python with C as well as Fortran. See http://www.scipy.org/Cookbook/f2py_and_NumPy

也许有可能使用f2py而不是swig。尽管它的名字,它能够连接python与C以及Fortran。见http://www.scipy.org/Cookbook/f2py_and_NumPy

The advantage is that it handles the conversion to numpy arrays automatically.

优点是它自动处理到numpy数组的转换。

Two caveats: if you don't already know Fortran, you may find f2py a bit strange; and I don't know how well it works with C++.

两个警告:如果你还不知道Fortran,你可能会发现f2py有点奇怪;我不知道它与C ++有多好用。

#3


0  

If you wrap your vector in an object that implements Pythons Buffer Interface, you can pass that to the numpy array for initialization (see docs, third argument). I would bet that this initialization is much faster, since it can just use memcpy to copy the data.

如果将向量包装在实现Pythons Buffer Interface的对象中,则可以将其传递给numpy数组进行初始化(请参阅docs,第三个参数)。我敢打赌,这种初始化要快得多,因为它只能使用memcpy来复制数据。

#4


0  

So it looks like the only real solution is to base something off pybuffer.i that can copy from C++ into an existing buffer. If you add this to a SWIG include file:

因此,看起来唯一真正的解决方案是基于pybuffer.i,可以从C ++复制到现有缓冲区。如果将其添加到SWIG包含文件:

%insert("python") %{
import numpy as np
%}

/*! Templated function to copy contents of a container to an allocated memory
 * buffer
 */
%inline %{
//==== ADDED BY numpy.i
#include <algorithm>

template < typename Container_T >
void copy_to_buffer(
        const Container_T& field,
        typename Container_T::value_type* buffer,
        typename Container_T::size_type length
        )
{
//    ValidateUserInput( length == field.size(),
//            "Destination buffer is the wrong size" );
    // put your own assertion here or BAD THINGS CAN HAPPEN

    if (length == field.size()) {
        std::copy( field.begin(), field.end(), buffer );
    }
}
//====

%}

%define TYPEMAP_COPY_TO_BUFFER(CLASS...)
%typemap(in) (CLASS::value_type* buffer, CLASS::size_type length)
(int res = 0, Py_ssize_t size_ = 0, void *buffer_ = 0) {

    res = PyObject_AsWriteBuffer($input, &buffer_, &size_);
    if ( res < 0 ) {
        PyErr_Clear();
        %argument_fail(res, "(CLASS::value_type*, CLASS::size_type length)",
                $symname, $argnum);
    }
    $1 = ($1_ltype) buffer_;
    $2 = ($2_ltype) (size_/sizeof($*1_type));
}
%enddef


%define ADD_NUMPY_ARRAY_INTERFACE(PYVALUE, PYCLASS, CLASS...)

TYPEMAP_COPY_TO_BUFFER(CLASS)

%template(_copy_to_buffer_ ## PYCLASS) copy_to_buffer< CLASS >;

%extend CLASS {
%insert("python") %{
def __array__(self):
    """Enable access to this data as a numpy array"""
    a = np.ndarray( shape=( len(self), ), dtype=PYVALUE )
    _copy_to_buffer_ ## PYCLASS(self, a)
    return a
%}
}

%enddef

then you can make a container "Numpy"-able with

然后你可以制作一个容器“Numpy” - 用

%template(DumbVectorFloat) DumbVector<double>;
ADD_NUMPY_ARRAY_INTERFACE(float, DumbVectorFloat, DumbVector<double>);

Then in Python, just do:

然后在Python中,只需:

# dvf is an instance of DumbVectorFloat
import numpy as np
my_numpy_array = np.asarray( dvf )

This has only the overhead of a single Python <--> C++ translation call, not the N that would result from a typical length-N array.

这只需要单个Python < - > C ++转换调用的开销,而不是典型的长度为N的数组所产生的N.

A slightly more complete version of this code is part of my PyTRT project at github.

这个代码的稍微完整版本是我在github上的PyTRT项目的一部分。