将熊猫dataframe转换为numpy数组,保存索引

时间:2021-10-17 21:16:00

I am interested in knowing how to convert a pandas dataframe into a numpy array, including the index, and set the dtypes.

我感兴趣的是知道如何将一个熊猫dataframe转换为一个numpy数组,包括索引,并设置dtypes。

dataframe:

dataframe:

import numpy as np
import pandas as pd

index = [1, 2, 3, 4, 5, 6, 7]
a = [np.nan, np.nan, np.nan, 0.1, 0.1, 0.1, 0.1]
b = [0.2, np.nan, 0.2, 0.2, 0.2, np.nan, np.nan]
c = [np.nan, 0.5, 0.5, np.nan, 0.5, 0.5, np.nan]
df = pd.DataFrame({'A': a, 'B': b, 'C': c}, index=index)
df = df.rename_axis('ID')

gives

给了

label   A    B    C
ID                                 
1   NaN  0.2  NaN
2   NaN  NaN  0.5
3   NaN  0.2  0.5
4   0.1  0.2  NaN
5   0.1  0.2  0.5
6   0.1  NaN  0.5
7   0.1  NaN  NaN

convert df to array returns:

将df转换为数组返回:

array([[ nan,  0.2,  nan],
       [ nan,  nan,  0.5],
       [ nan,  0.2,  0.5],
       [ 0.1,  0.2,  nan],
       [ 0.1,  0.2,  0.5],
       [ 0.1,  nan,  0.5],
       [ 0.1,  nan,  nan]])

However, I would like:

然而,我想:

array([[ 1, nan,  0.2,  nan],
       [ 2, nan,  nan,  0.5],
       [ 3, nan,  0.2,  0.5],
       [ 4, 0.1,  0.2,  nan],
       [ 5, 0.1,  0.2,  0.5],
       [ 6, 0.1,  nan,  0.5],
       [ 7, 0.1,  nan,  nan]],
     dtype=[('ID', '<i4'), ('A', '<f8'), ('B', '<f8'), ('B', '<f8')])

(or similar)

(或相似的)

Any suggestions on how to accomplish this? (I don't know if I need 1D or 2D array at this point.) I've seen a few posts that touch on this, but nothing dealing specifically with the dataframe.index.

有什么建议吗?(我不知道现在是需要1D还是2D数组。)我看到过一些与此相关的文章,但没有专门讨论dataframe.index的文章。

I am writing the dataframe disk using to_csv (and reading it back in to create array) as a workaround, but would prefer something more eloquent than my new-to-pandas kludging.

我正在使用to_csv编写dataframe磁盘(并将其重新读入创建数组)作为一个解决方案,但更希望使用比我的新到的对象更有意义的方法。

10 个解决方案

#1


147  

To convert a pandas dataframe (df) to a numpy ndarray, use this code:

要将熊猫数据aframe (df)转换为numpy ndarray,请使用以下代码:

df = df.values

df = df.values

df now becomes the numpy ndarray:

df现在变成了numpy ndarray:

array([[nan, 0.2, nan],
       [nan, nan, 0.5],
       [nan, 0.2, 0.5],
       [0.1, 0.2, nan],
       [0.1, 0.2, 0.5],
       [0.1, nan, 0.5],
       [0.1, nan, nan]])

#2


89  

Pandas has something built in...

熊猫在……

numpy_matrix = df.as_matrix()

gives

给了

array([[nan, 0.2, nan],
       [nan, nan, 0.5],
       [nan, 0.2, 0.5],
       [0.1, 0.2, nan],
       [0.1, 0.2, 0.5],
       [0.1, nan, 0.5],
       [0.1, nan, nan]])

#3


41  

I would just chain the DataFrame.reset_index() and DataFrame.values functions to get the Numpy representation of the dataframe, including the index:

我只需将DataFrame.reset_index()和DataFrame链接起来。值函数获取数据aframe的Numpy表示,包括索引:

In [8]: df
Out[8]: 
          A         B         C
0 -0.982726  0.150726  0.691625
1  0.617297 -0.471879  0.505547
2  0.417123 -1.356803 -1.013499
3 -0.166363 -0.957758  1.178659
4 -0.164103  0.074516 -0.674325
5 -0.340169 -0.293698  1.231791
6 -1.062825  0.556273  1.508058
7  0.959610  0.247539  0.091333

[8 rows x 3 columns]

In [9]: df.reset_index().values
Out[9]:
array([[ 0.        , -0.98272574,  0.150726  ,  0.69162512],
       [ 1.        ,  0.61729734, -0.47187926,  0.50554728],
       [ 2.        ,  0.4171228 , -1.35680324, -1.01349922],
       [ 3.        , -0.16636303, -0.95775849,  1.17865945],
       [ 4.        , -0.16410334,  0.0745164 , -0.67432474],
       [ 5.        , -0.34016865, -0.29369841,  1.23179064],
       [ 6.        , -1.06282542,  0.55627285,  1.50805754],
       [ 7.        ,  0.95961001,  0.24753911,  0.09133339]])

To get the dtypes we'd need to transform this ndarray into a structured array using view:

为了获得dtype,我们需要使用view将这个ndarray转换为一个结构化数组:

In [10]: df.reset_index().values.ravel().view(dtype=[('index', int), ('A', float), ('B', float), ('C', float)])
Out[10]:
array([( 0, -0.98272574,  0.150726  ,  0.69162512),
       ( 1,  0.61729734, -0.47187926,  0.50554728),
       ( 2,  0.4171228 , -1.35680324, -1.01349922),
       ( 3, -0.16636303, -0.95775849,  1.17865945),
       ( 4, -0.16410334,  0.0745164 , -0.67432474),
       ( 5, -0.34016865, -0.29369841,  1.23179064),
       ( 6, -1.06282542,  0.55627285,  1.50805754),
       ( 7,  0.95961001,  0.24753911,  0.09133339),
       dtype=[('index', '<i8'), ('A', '<f8'), ('B', '<f8'), ('C', '<f8')])

#4


26  

You can use the to_records method, but have to play around a bit with the dtypes if they are not what you want from the get go. In my case, having copied your DF from a string, the index type is string (represented by an object dtype in pandas):

您可以使用to_records方法,但如果dtypes不是您希望的那样,则必须对它们进行一些操作。在我的例子中,从字符串中复制了DF,索引类型是string(在熊猫中以对象dtype表示):

In [102]: df
Out[102]: 
label    A    B    C
ID                  
1      NaN  0.2  NaN
2      NaN  NaN  0.5
3      NaN  0.2  0.5
4      0.1  0.2  NaN
5      0.1  0.2  0.5
6      0.1  NaN  0.5
7      0.1  NaN  NaN

In [103]: df.index.dtype
Out[103]: dtype('object')
In [104]: df.to_records()
Out[104]: 
rec.array([(1, nan, 0.2, nan), (2, nan, nan, 0.5), (3, nan, 0.2, 0.5),
       (4, 0.1, 0.2, nan), (5, 0.1, 0.2, 0.5), (6, 0.1, nan, 0.5),
       (7, 0.1, nan, nan)], 
      dtype=[('index', '|O8'), ('A', '<f8'), ('B', '<f8'), ('C', '<f8')])
In [106]: df.to_records().dtype
Out[106]: dtype([('index', '|O8'), ('A', '<f8'), ('B', '<f8'), ('C', '<f8')])

Converting the recarray dtype does not work for me, but one can do this in Pandas already:

对recarray dtype进行转换对我不适用,但在熊猫中已经可以做到:

In [109]: df.index = df.index.astype('i8')
In [111]: df.to_records().view([('ID', '<i8'), ('A', '<f8'), ('B', '<f8'), ('C', '<f8')])
Out[111]:
rec.array([(1, nan, 0.2, nan), (2, nan, nan, 0.5), (3, nan, 0.2, 0.5),
       (4, 0.1, 0.2, nan), (5, 0.1, 0.2, 0.5), (6, 0.1, nan, 0.5),
       (7, 0.1, nan, nan)], 
      dtype=[('ID', '<i8'), ('A', '<f8'), ('B', '<f8'), ('C', '<f8')])

Note that Pandas does not set the name of the index properly (to ID) in the exported record array (a bug?), so we profit from the type conversion to also correct for that.

注意,熊猫没有在导出的记录数组中正确地(将索引的名称设置为ID)(错误?)

At the moment Pandas has only 8-byte integers, i8, and floats, f8 (see this issue).

目前熊猫只有8字节的整数i8和浮点数f8(参见本期)。

#5


9  

Here is my approach to making a structure array from a pandas DataFrame.

下面是我用熊猫数据存储器制作结构数组的方法。

Create the data frame

创建一个数据帧

import pandas as pd
import numpy as np
import six

NaN = float('nan')
ID = [1, 2, 3, 4, 5, 6, 7]
A = [NaN, NaN, NaN, 0.1, 0.1, 0.1, 0.1]
B = [0.2, NaN, 0.2, 0.2, 0.2, NaN, NaN]
C = [NaN, 0.5, 0.5, NaN, 0.5, 0.5, NaN]
columns = {'A':A, 'B':B, 'C':C}
df = pd.DataFrame(columns, index=ID)
df.index.name = 'ID'
print(df)

      A    B    C
ID               
1   NaN  0.2  NaN
2   NaN  NaN  0.5
3   NaN  0.2  0.5
4   0.1  0.2  NaN
5   0.1  0.2  0.5
6   0.1  NaN  0.5
7   0.1  NaN  NaN

Define function to make a numpy structure array (not a record array) from a pandas DataFrame.

定义函数,从熊猫数据存储器中创建一个numpy结构数组(不是记录数组)。

def df_to_sarray(df):
    """
    Convert a pandas DataFrame object to a numpy structured array.
    This is functionally equivalent to but more efficient than
    np.array(df.to_array())

    :param df: the data frame to convert
    :return: a numpy structured array representation of df
    """

    v = df.values
    cols = df.columns

    if six.PY2:  # python 2 needs .encode() but 3 does not
        types = [(cols[i].encode(), df[k].dtype.type) for (i, k) in enumerate(cols)]
    else:
        types = [(cols[i], df[k].dtype.type) for (i, k) in enumerate(cols)]
    dtype = np.dtype(types)
    z = np.zeros(v.shape[0], dtype)
    for (i, k) in enumerate(z.dtype.names):
        z[k] = v[:, i]
    return z

Use reset_index to make a new data frame that includes the index as part of its data. Convert that data frame to a structure array.

使用reset_index创建一个新的数据框架,将索引作为数据的一部分。将数据帧转换为结构数组。

sa = df_to_sarray(df.reset_index())
sa

array([(1L, nan, 0.2, nan), (2L, nan, nan, 0.5), (3L, nan, 0.2, 0.5),
       (4L, 0.1, 0.2, nan), (5L, 0.1, 0.2, 0.5), (6L, 0.1, nan, 0.5),
       (7L, 0.1, nan, nan)], 
      dtype=[('ID', '<i8'), ('A', '<f8'), ('B', '<f8'), ('C', '<f8')])

EDIT: Updated df_to_sarray to avoid error calling .encode() with python 3. Thanks to Joseph Garvin and halcyon for their comment and solution.

编辑:更新df_to_sarray以避免使用python 3调用.encode()。感谢Joseph Garvin和halcyon对他们的评论和解决方案。

#6


6  

It seems like df.to_records() will work for you. The exact feature you're looking for was requested and to_records pointed to as an alternative.

看起来df.to_records()对您来说是可行的。您正在寻找的确切特性被请求,to_records被指向作为替代。

I tried this out locally using your example, and that call yields something very similar to the output you were looking for:

我用你的例子在本地尝试过这个方法,这个调用产生的结果与你想要的输出非常相似:

rec.array([(1, nan, 0.2, nan), (2, nan, nan, 0.5), (3, nan, 0.2, 0.5),
       (4, 0.1, 0.2, nan), (5, 0.1, 0.2, 0.5), (6, 0.1, nan, 0.5),
       (7, 0.1, nan, nan)],
      dtype=[(u'ID', '<i8'), (u'A', '<f8'), (u'B', '<f8'), (u'C', '<f8')])

Note that this is a recarray rather than an array. You could move the result in to regular numpy array by calling its constructor as np.array(df.to_records()).

注意,这是一个recarray而不是数组。您可以通过调用其构造函数np.array(df.to_records())将结果移动到常规的numpy数组。

#7


4  

Two ways to convert the data-frame to its Numpy-array representation.

将数据帧转换为其Numpy-array表示的两种方法。

  • mah_np_array = df.as_matrix(columns=None)

    mah_np_array = df.as_matrix(列=没有)

  • mah_np_array = df.values

    mah_np_array = df.values

Doc: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.as_matrix.html

医生:https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.as_matrix.html

#8


3  

Further to meteore's answer, I found the code

在梅特摩尔的回答之后,我找到了代码

df.index = df.index.astype('i8')

doesn't work for me. So I put my code here for the convenience of others stuck with this issue.

不为我工作。所以我把我的代码放在这里是为了方便其他陷入这个问题的人。

city_cluster_df = pd.read_csv(text_filepath, encoding='utf-8')
# the field 'city_en' is a string, when converted to Numpy array, it will be an object
city_cluster_arr = city_cluster_df[['city_en','lat','lon','cluster','cluster_filtered']].to_records()
descr=city_cluster_arr.dtype.descr
# change the field 'city_en' to string type (the index for 'city_en' here is 1 because before the field is the row index of dataframe)
descr[1]=(descr[1][0], "S20")
newArr=city_cluster_arr.astype(np.dtype(descr))

#9


2  

thanks for Phil's answer, it's great.

谢谢菲尔的回答,太棒了。

reply for

回复的

doesn't work for me, error: TypeError: data type not understood – Joseph Garvin Feb 13 at 17:55

对我不起作用,错误:TypeError:数据类型不理解- Joseph Garvin 2月13日17:55

I use python 3, and get the same Error. and then I delete .encode() , then expression is as following.

我使用python 3,得到相同的错误。然后我删除。encode(),表达式如下所示。

types = [(cols[i], df[k].dtype.type) for (i, k) in enumerate(cols)]

then it works.

它的工作原理。

#10


0  

Just had a similar problem when exporting from dataframe to arcgis table and stumbled on a solution from usgs (https://my.usgs.gov/confluence/display/cdi/pandas.DataFrame+to+ArcGIS+Table). In short your problem has a similar solution:

当从dataframe导出到arcgis表时,遇到了类似的问题,并遇到了来自usgs的解决方案(https://my.usgs.gov/confluence/display/cdi/pandas.DataFrame+到+ arcgis + table)。简而言之,你的问题有一个相似的解决方案:

df
Out[109]: 
      A    B    C
ID               
1   NaN  0.2  NaN
2   NaN  NaN  0.5
3   NaN  0.2  0.5
4   0.1  0.2  NaN
5   0.1  0.2  0.5
6   0.1  NaN  0.5
7   0.1  NaN  NaN

np_data = np.array(np.rec.fromrecords(df.values))
np_names = df.dtypes.index.tolist()
np_data.dtype.names = tuple([name.encode('UTF8') for name in np_names])

np_data
Out[113]: 
array([( nan,  0.2,  nan), ( nan,  nan,  0.5), ( nan,  0.2,  0.5),
       ( 0.1,  0.2,  nan), ( 0.1,  0.2,  0.5), ( 0.1,  nan,  0.5),
       ( 0.1,  nan,  nan)], 
      dtype=(numpy.record, [('A', '<f8'), ('B', '<f8'), ('C', '<f8')]))

#1


147  

To convert a pandas dataframe (df) to a numpy ndarray, use this code:

要将熊猫数据aframe (df)转换为numpy ndarray,请使用以下代码:

df = df.values

df = df.values

df now becomes the numpy ndarray:

df现在变成了numpy ndarray:

array([[nan, 0.2, nan],
       [nan, nan, 0.5],
       [nan, 0.2, 0.5],
       [0.1, 0.2, nan],
       [0.1, 0.2, 0.5],
       [0.1, nan, 0.5],
       [0.1, nan, nan]])

#2


89  

Pandas has something built in...

熊猫在……

numpy_matrix = df.as_matrix()

gives

给了

array([[nan, 0.2, nan],
       [nan, nan, 0.5],
       [nan, 0.2, 0.5],
       [0.1, 0.2, nan],
       [0.1, 0.2, 0.5],
       [0.1, nan, 0.5],
       [0.1, nan, nan]])

#3


41  

I would just chain the DataFrame.reset_index() and DataFrame.values functions to get the Numpy representation of the dataframe, including the index:

我只需将DataFrame.reset_index()和DataFrame链接起来。值函数获取数据aframe的Numpy表示,包括索引:

In [8]: df
Out[8]: 
          A         B         C
0 -0.982726  0.150726  0.691625
1  0.617297 -0.471879  0.505547
2  0.417123 -1.356803 -1.013499
3 -0.166363 -0.957758  1.178659
4 -0.164103  0.074516 -0.674325
5 -0.340169 -0.293698  1.231791
6 -1.062825  0.556273  1.508058
7  0.959610  0.247539  0.091333

[8 rows x 3 columns]

In [9]: df.reset_index().values
Out[9]:
array([[ 0.        , -0.98272574,  0.150726  ,  0.69162512],
       [ 1.        ,  0.61729734, -0.47187926,  0.50554728],
       [ 2.        ,  0.4171228 , -1.35680324, -1.01349922],
       [ 3.        , -0.16636303, -0.95775849,  1.17865945],
       [ 4.        , -0.16410334,  0.0745164 , -0.67432474],
       [ 5.        , -0.34016865, -0.29369841,  1.23179064],
       [ 6.        , -1.06282542,  0.55627285,  1.50805754],
       [ 7.        ,  0.95961001,  0.24753911,  0.09133339]])

To get the dtypes we'd need to transform this ndarray into a structured array using view:

为了获得dtype,我们需要使用view将这个ndarray转换为一个结构化数组:

In [10]: df.reset_index().values.ravel().view(dtype=[('index', int), ('A', float), ('B', float), ('C', float)])
Out[10]:
array([( 0, -0.98272574,  0.150726  ,  0.69162512),
       ( 1,  0.61729734, -0.47187926,  0.50554728),
       ( 2,  0.4171228 , -1.35680324, -1.01349922),
       ( 3, -0.16636303, -0.95775849,  1.17865945),
       ( 4, -0.16410334,  0.0745164 , -0.67432474),
       ( 5, -0.34016865, -0.29369841,  1.23179064),
       ( 6, -1.06282542,  0.55627285,  1.50805754),
       ( 7,  0.95961001,  0.24753911,  0.09133339),
       dtype=[('index', '<i8'), ('A', '<f8'), ('B', '<f8'), ('C', '<f8')])

#4


26  

You can use the to_records method, but have to play around a bit with the dtypes if they are not what you want from the get go. In my case, having copied your DF from a string, the index type is string (represented by an object dtype in pandas):

您可以使用to_records方法,但如果dtypes不是您希望的那样,则必须对它们进行一些操作。在我的例子中,从字符串中复制了DF,索引类型是string(在熊猫中以对象dtype表示):

In [102]: df
Out[102]: 
label    A    B    C
ID                  
1      NaN  0.2  NaN
2      NaN  NaN  0.5
3      NaN  0.2  0.5
4      0.1  0.2  NaN
5      0.1  0.2  0.5
6      0.1  NaN  0.5
7      0.1  NaN  NaN

In [103]: df.index.dtype
Out[103]: dtype('object')
In [104]: df.to_records()
Out[104]: 
rec.array([(1, nan, 0.2, nan), (2, nan, nan, 0.5), (3, nan, 0.2, 0.5),
       (4, 0.1, 0.2, nan), (5, 0.1, 0.2, 0.5), (6, 0.1, nan, 0.5),
       (7, 0.1, nan, nan)], 
      dtype=[('index', '|O8'), ('A', '<f8'), ('B', '<f8'), ('C', '<f8')])
In [106]: df.to_records().dtype
Out[106]: dtype([('index', '|O8'), ('A', '<f8'), ('B', '<f8'), ('C', '<f8')])

Converting the recarray dtype does not work for me, but one can do this in Pandas already:

对recarray dtype进行转换对我不适用,但在熊猫中已经可以做到:

In [109]: df.index = df.index.astype('i8')
In [111]: df.to_records().view([('ID', '<i8'), ('A', '<f8'), ('B', '<f8'), ('C', '<f8')])
Out[111]:
rec.array([(1, nan, 0.2, nan), (2, nan, nan, 0.5), (3, nan, 0.2, 0.5),
       (4, 0.1, 0.2, nan), (5, 0.1, 0.2, 0.5), (6, 0.1, nan, 0.5),
       (7, 0.1, nan, nan)], 
      dtype=[('ID', '<i8'), ('A', '<f8'), ('B', '<f8'), ('C', '<f8')])

Note that Pandas does not set the name of the index properly (to ID) in the exported record array (a bug?), so we profit from the type conversion to also correct for that.

注意,熊猫没有在导出的记录数组中正确地(将索引的名称设置为ID)(错误?)

At the moment Pandas has only 8-byte integers, i8, and floats, f8 (see this issue).

目前熊猫只有8字节的整数i8和浮点数f8(参见本期)。

#5


9  

Here is my approach to making a structure array from a pandas DataFrame.

下面是我用熊猫数据存储器制作结构数组的方法。

Create the data frame

创建一个数据帧

import pandas as pd
import numpy as np
import six

NaN = float('nan')
ID = [1, 2, 3, 4, 5, 6, 7]
A = [NaN, NaN, NaN, 0.1, 0.1, 0.1, 0.1]
B = [0.2, NaN, 0.2, 0.2, 0.2, NaN, NaN]
C = [NaN, 0.5, 0.5, NaN, 0.5, 0.5, NaN]
columns = {'A':A, 'B':B, 'C':C}
df = pd.DataFrame(columns, index=ID)
df.index.name = 'ID'
print(df)

      A    B    C
ID               
1   NaN  0.2  NaN
2   NaN  NaN  0.5
3   NaN  0.2  0.5
4   0.1  0.2  NaN
5   0.1  0.2  0.5
6   0.1  NaN  0.5
7   0.1  NaN  NaN

Define function to make a numpy structure array (not a record array) from a pandas DataFrame.

定义函数,从熊猫数据存储器中创建一个numpy结构数组(不是记录数组)。

def df_to_sarray(df):
    """
    Convert a pandas DataFrame object to a numpy structured array.
    This is functionally equivalent to but more efficient than
    np.array(df.to_array())

    :param df: the data frame to convert
    :return: a numpy structured array representation of df
    """

    v = df.values
    cols = df.columns

    if six.PY2:  # python 2 needs .encode() but 3 does not
        types = [(cols[i].encode(), df[k].dtype.type) for (i, k) in enumerate(cols)]
    else:
        types = [(cols[i], df[k].dtype.type) for (i, k) in enumerate(cols)]
    dtype = np.dtype(types)
    z = np.zeros(v.shape[0], dtype)
    for (i, k) in enumerate(z.dtype.names):
        z[k] = v[:, i]
    return z

Use reset_index to make a new data frame that includes the index as part of its data. Convert that data frame to a structure array.

使用reset_index创建一个新的数据框架,将索引作为数据的一部分。将数据帧转换为结构数组。

sa = df_to_sarray(df.reset_index())
sa

array([(1L, nan, 0.2, nan), (2L, nan, nan, 0.5), (3L, nan, 0.2, 0.5),
       (4L, 0.1, 0.2, nan), (5L, 0.1, 0.2, 0.5), (6L, 0.1, nan, 0.5),
       (7L, 0.1, nan, nan)], 
      dtype=[('ID', '<i8'), ('A', '<f8'), ('B', '<f8'), ('C', '<f8')])

EDIT: Updated df_to_sarray to avoid error calling .encode() with python 3. Thanks to Joseph Garvin and halcyon for their comment and solution.

编辑:更新df_to_sarray以避免使用python 3调用.encode()。感谢Joseph Garvin和halcyon对他们的评论和解决方案。

#6


6  

It seems like df.to_records() will work for you. The exact feature you're looking for was requested and to_records pointed to as an alternative.

看起来df.to_records()对您来说是可行的。您正在寻找的确切特性被请求,to_records被指向作为替代。

I tried this out locally using your example, and that call yields something very similar to the output you were looking for:

我用你的例子在本地尝试过这个方法,这个调用产生的结果与你想要的输出非常相似:

rec.array([(1, nan, 0.2, nan), (2, nan, nan, 0.5), (3, nan, 0.2, 0.5),
       (4, 0.1, 0.2, nan), (5, 0.1, 0.2, 0.5), (6, 0.1, nan, 0.5),
       (7, 0.1, nan, nan)],
      dtype=[(u'ID', '<i8'), (u'A', '<f8'), (u'B', '<f8'), (u'C', '<f8')])

Note that this is a recarray rather than an array. You could move the result in to regular numpy array by calling its constructor as np.array(df.to_records()).

注意,这是一个recarray而不是数组。您可以通过调用其构造函数np.array(df.to_records())将结果移动到常规的numpy数组。

#7


4  

Two ways to convert the data-frame to its Numpy-array representation.

将数据帧转换为其Numpy-array表示的两种方法。

  • mah_np_array = df.as_matrix(columns=None)

    mah_np_array = df.as_matrix(列=没有)

  • mah_np_array = df.values

    mah_np_array = df.values

Doc: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.as_matrix.html

医生:https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.as_matrix.html

#8


3  

Further to meteore's answer, I found the code

在梅特摩尔的回答之后,我找到了代码

df.index = df.index.astype('i8')

doesn't work for me. So I put my code here for the convenience of others stuck with this issue.

不为我工作。所以我把我的代码放在这里是为了方便其他陷入这个问题的人。

city_cluster_df = pd.read_csv(text_filepath, encoding='utf-8')
# the field 'city_en' is a string, when converted to Numpy array, it will be an object
city_cluster_arr = city_cluster_df[['city_en','lat','lon','cluster','cluster_filtered']].to_records()
descr=city_cluster_arr.dtype.descr
# change the field 'city_en' to string type (the index for 'city_en' here is 1 because before the field is the row index of dataframe)
descr[1]=(descr[1][0], "S20")
newArr=city_cluster_arr.astype(np.dtype(descr))

#9


2  

thanks for Phil's answer, it's great.

谢谢菲尔的回答,太棒了。

reply for

回复的

doesn't work for me, error: TypeError: data type not understood – Joseph Garvin Feb 13 at 17:55

对我不起作用,错误:TypeError:数据类型不理解- Joseph Garvin 2月13日17:55

I use python 3, and get the same Error. and then I delete .encode() , then expression is as following.

我使用python 3,得到相同的错误。然后我删除。encode(),表达式如下所示。

types = [(cols[i], df[k].dtype.type) for (i, k) in enumerate(cols)]

then it works.

它的工作原理。

#10


0  

Just had a similar problem when exporting from dataframe to arcgis table and stumbled on a solution from usgs (https://my.usgs.gov/confluence/display/cdi/pandas.DataFrame+to+ArcGIS+Table). In short your problem has a similar solution:

当从dataframe导出到arcgis表时,遇到了类似的问题,并遇到了来自usgs的解决方案(https://my.usgs.gov/confluence/display/cdi/pandas.DataFrame+到+ arcgis + table)。简而言之,你的问题有一个相似的解决方案:

df
Out[109]: 
      A    B    C
ID               
1   NaN  0.2  NaN
2   NaN  NaN  0.5
3   NaN  0.2  0.5
4   0.1  0.2  NaN
5   0.1  0.2  0.5
6   0.1  NaN  0.5
7   0.1  NaN  NaN

np_data = np.array(np.rec.fromrecords(df.values))
np_names = df.dtypes.index.tolist()
np_data.dtype.names = tuple([name.encode('UTF8') for name in np_names])

np_data
Out[113]: 
array([( nan,  0.2,  nan), ( nan,  nan,  0.5), ( nan,  0.2,  0.5),
       ( 0.1,  0.2,  nan), ( 0.1,  0.2,  0.5), ( 0.1,  nan,  0.5),
       ( 0.1,  nan,  nan)], 
      dtype=(numpy.record, [('A', '<f8'), ('B', '<f8'), ('C', '<f8')]))