numpy连接不向空多维数组追加新数组

I bet I am doing something very simple wrong. I want to start with an empty 2D numpy array and append arrays to it (with dimensions 1 row by 4 columns).

我敢打赌我做错了很简单的事。我想从一个空的2D numpy数组开始，并将数组附加到它(维度1×4列)。

open_cost_mat_train = np.matrix([])

for i in xrange(10):
    open_cost_mat = np.array([i,0,0,0])
    open_cost_mat_train = np.vstack([open_cost_mat_train,open_cost_mat])

my error trace is:

我的错误跟踪:

  File "/Users/me/anaconda/lib/python2.7/site-packages/numpy/core/shape_base.py", line 230, in vstack
    return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
ValueError: all the input array dimensions except for the concatenation axis must match exactly

What am I doing wrong? I have tried append, concatenate, defining the empty 2D array as [[]], as [], array([]) and many others.

我做错了什么?我尝试了append、concatenate，定义了空的2D数组，as[]、array([])和许多其他数组。

2 个解决方案

#1

If open_cost_mat_train is large I would encourage you to replace the for loop by a vectorized algorithm. I will use the following funtions to show how efficiency is improved by vectorizing loops:

如果open_cost_mat_train很大，我建议您使用矢量化算法替换for循环。我将使用下面的方法来说明向量化循环如何提高效率:

def fvstack():
    import numpy as np
    np.random.seed(100)
    ocmt = np.matrix([]).reshape((0, 4))
    for i in xrange(10):
        x = np.random.random()
        ocm = np.array([x, x + 1, 10*x, x/10])
        ocmt = np.vstack([ocmt, ocm])
    return ocmt

def fshape():
    import numpy as np
    from numpy.matlib import empty
    np.random.seed(100)
    ocmt = empty((10, 4))
    for i in xrange(ocmt.shape[0]):
        ocmt[i, 0] = np.random.random()
    ocmt[:, 1] = ocmt[:, 0] + 1
    ocmt[:, 2] = 10*ocmt[:, 0]
    ocmt[:, 3] = ocmt[:, 0]/10
    return ocmt

I've assumed that the values that populate the first column of ocmt (shorthand for open_cost_mat_train) are obtained from a for loop, and the remaining columns are a function of the first column, as stated in your comments to my original answer. As real costs data are not available, in the forthcoming example the values in the first column are random numbers, and the second, third and fourth columns are the functions x + 1, 10*x and x/10, respectively, where x is the corresponding value in the first column.

我假设填充ocmt的第一列(open_cost_mat_train的简写)的值是从一个for循环中获得的，其余的列是第一列的函数，如您对我最初答案的注释中所述。由于实际成本数据不可用，在即将到来的示例中，第一列中的值是随机数，第二列、第三列和第四列分别是函数x + 1、10*x和x/10，其中x是第一列中的对应值。

In [594]: fvstack()
Out[594]: 
matrix([[  5.43404942e-01,   1.54340494e+00,   5.43404942e+00,   5.43404942e-02],
        [  2.78369385e-01,   1.27836939e+00,   2.78369385e+00,   2.78369385e-02],
        [  4.24517591e-01,   1.42451759e+00,   4.24517591e+00,   4.24517591e-02],
        [  8.44776132e-01,   1.84477613e+00,   8.44776132e+00,   8.44776132e-02],
        [  4.71885619e-03,   1.00471886e+00,   4.71885619e-02,   4.71885619e-04],
        [  1.21569121e-01,   1.12156912e+00,   1.21569121e+00,   1.21569121e-02],
        [  6.70749085e-01,   1.67074908e+00,   6.70749085e+00,   6.70749085e-02],
        [  8.25852755e-01,   1.82585276e+00,   8.25852755e+00,   8.25852755e-02],
        [  1.36706590e-01,   1.13670659e+00,   1.36706590e+00,   1.36706590e-02],
        [  5.75093329e-01,   1.57509333e+00,   5.75093329e+00,   5.75093329e-02]])

In [595]: np.allclose(fvstack(), fshape())
Out[595]: True

In order for the calls to fvstack() and fshape() produce the same results, the random number generator is initialized in both functions through np.random.seed(100). Notice that the equality test has been performed using numpy.allclose instead of fvstack() == fshape() to avoid the round off errors associated to floating point artihmetic.

为了让对fvstack()和fshape()的调用产生相同的结果，两个函数中的随机数生成器都通过np.random.seed(100)进行初始化。注意，相等性测试是使用numpy执行的。allclose而不是fvstack() = fshape()，以避免与浮点修饰相关的舍入误差。

As for efficiency, the following interactive session shows that initializing ocmt with its final shape is significantly faster than repeatedly stacking rows:

关于效率，下面的交互会话显示用最终形状初始化ocmt要比重复叠加行快得多:

In [596]: import timeit

In [597]: timeit.timeit('fvstack()', setup="from __main__ import fvstack", number=10000)
Out[597]: 1.4884241055042366

In [598]: timeit.timeit('fshape()', setup="from __main__ import fshape", number=10000)
Out[598]: 0.8819408006311278

#2

You need to reshape your original matrix so that the number of columns match the appended arrays:

您需要重新调整原始矩阵，以便列数与附加数组匹配:

open_cost_mat_train = np.matrix([]).reshape((0,4))

After which, it gives:

之后,它给:

open_cost_mat_train

# matrix([[ 0.,  0.,  0.,  0.],
#         [ 1.,  0.,  0.,  0.],
#         [ 2.,  0.,  0.,  0.],
#         [ 3.,  0.,  0.,  0.],
#         [ 4.,  0.,  0.,  0.],
#         [ 5.,  0.,  0.,  0.],
#         [ 6.,  0.,  0.,  0.],
#         [ 7.,  0.,  0.,  0.],
#         [ 8.,  0.,  0.,  0.],
#         [ 9.,  0.,  0.,  0.]])

#1