如何获得熊猫数据档案的行数?

时间:2022-01-23 04:48:17

I'm trying to get the number of rows of dataframe df with Pandas, and here is my code.

我正在尝试获取与熊猫的数据aframe df的行数,这是我的代码。

Method 1:

total_rows = df.count
print total_rows +1

Method 2:

total_rows = df['First_columnn_label'].count
print total_rows +1

Both the code snippets give me this error:

这两个代码片段都给了我这个错误:

TypeError: unsupported operand type(s) for +: 'instancemethod' and 'int'

类型错误:不支持的操作数类型为+:'instancemethod'和'int'

What am I doing wrong?

我做错了什么?

According to the answer given by @root the best (the fastest) way to check df length is to call:

根据@root给出的答案,检查df长度最好的(最快的)方法是调用:

df.shape[0]

12 个解决方案

#1


531  

You can use the .shape property or just len(DataFrame.index). However, there are notable performance differences ( the .shape property is faster):

可以使用.shape属性,也可以使用len(DataFrame.index)。但是,有显著的性能差异(.shape属性更快):

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: df = pd.DataFrame(np.arange(12).reshape(4,3))

In [4]: df
Out[4]: 
   0  1  2
0  0  1  2
1  3  4  5
2  6  7  8
3  9  10 11

In [5]: df.shape
Out[5]: (4, 3)

In [6]: timeit df.shape
1000000 loops, best of 3: 1.17 us per loop

In [7]: timeit df[0].count()
10000 loops, best of 3: 56 us per loop

In [8]: len(df.index)
Out[8]: 4

In [9]: timeit len(df.index)
1000000 loops, best of 3: 381 ns per loop

如何获得熊猫数据档案的行数?

EDIT: As @Dan Allen noted in the comments len(df.index) and df[0].count() are not interchangeable as count excludes NaNs,

编辑:正如@Dan Allen在评论len(df.index)和df[0].count()中指出的那样,由于count不包含NaNs,所以不能互换,

#2


93  

suppose df is your dataframe then:

假设df是你的dataframe:

Count_Row=df.shape[0] #gives number of row count
Count_Col=df.shape[1] #gives number of col count

#3


89  

Use len(df). This works as of pandas 0.11 or maybe even earlier.

使用len(df)。这在熊猫0.11或更早的时候起作用。

__len__() is currently (0.12) documented with Returns length of index. Timing info, set up the same way as in root's answer:

__len__()目前用索引的返回长度记录(0.12)。定时信息,设置与root的答案相同:

In [7]: timeit len(df.index)
1000000 loops, best of 3: 248 ns per loop

In [8]: timeit len(df)
1000000 loops, best of 3: 573 ns per loop

Due to one additional function call it is a bit slower than calling len(df.index) directly, but this should not play any role in most use cases.

由于一个额外的函数调用,它比直接调用len(df.index)要慢一些,但是在大多数用例中这不应该起任何作用。

#4


15  

Apart from above answers use can use df.axes to get the tuple with row and column indexes and then use len() function:

除以上答案外,可使用df。轴获取具有行和列索引的元组,然后使用len()函数:

total_rows=len(df.axes[0])
total_cols=len(df.axes[1])

#5


11  

For getting rows use

获取行用

df.index

and for columns use

和列使用

df.columns

You can always use len(func) for getting the count of list, hence you can use len(df.index) for getting the number of rows.

您可以使用len(func)来获得列表的计数,因此您可以使用len(df.index)来获得行数。

But keep in mind, as stated by @root, using shape[0] and shape[1] for getting the number of rows and columns, respectively, is a faster option.

但是请记住,如@root所述,使用形状[0]和形状[1]分别获取行数和列数是一个更快的选择。

#6


6  

I come to pandas from R background, and I see that pandas is more complicated when it comes to selecting row or column. I had to wrestle with it for a while, then I found some ways to deal with:

我从R的背景来研究熊猫,我发现熊猫在选择行或列的时候更加复杂。我不得不和它搏斗一段时间,然后我找到了一些对付的方法:

getting the number of columns:

获取列数:

len(df.columns)  
## Here:
#df is your data.frame
#df.columns return a string, it contains column's titles of the df. 
#Then, "len()" gets the length of it.

getting the number of rows:

获取行数:

len(df.index) #It's similar.

#7


3  

df.shape returns the shape of the data frame in the form of a tuple (no. of rows, no. of cols).

df。shape以元组的形式返回数据帧的形状(no)。行,不。峡路)。

You can simply access no. of rows or no. of cols with df.shape[0] or df.shape[1], respectively, which is same as accessing the values of the tuple.

你可以访问no。行或没有。df的关口。[0]或df形状。分别为形状[1],这与访问元组的值相同。

#8


3  

...building on Jan-Philip Gehrcke's answer.

…基于Jan-Philip Gehrcke的答案。

The reason why len(df) or len(df.index) is faster than df.shape[0]. Look at the code. df.shape is a @property that runs a DataFrame method calling len twice.

为什么len(df)或len(df.index)要比df.shape[0]快。看一下代码。df。shape是一个@property,它运行两次调用len的DataFrame方法。

df.shape??
Type:        property
String form: <property object at 0x1127b33c0>
Source:     
# df.shape.fget
@property
def shape(self):
    """
    Return a tuple representing the dimensionality of the DataFrame.
    """
    return len(self.index), len(self.columns)

And beneath the hood of len(df)

在len(df)的引擎盖下

df.__len__??
Signature: df.__len__()
Source:   
    def __len__(self):
        """Returns length of info axis, but here we use the index """
        return len(self.index)
File:      ~/miniconda2/lib/python2.7/site-packages/pandas/core/frame.py
Type:      instancemethod

len(df.index) will be slightly faster than len(df) since it has one less function call, but this is always faster than df.shape[0]

len(df.index)将比len(df)稍微快一点,因为它有一个更少的函数调用,但是这总是比df.shape[0]快

#9


2  

In case you want to get the row count in the middle of a chained operation, you can use:

如果您想在链式操作中获取行数,可以使用:

df.pipe(len)

Example:

例子:

row_count = (
      pd.DataFrame(np.random.rand(3,4))
      .reset_index()
      .pipe(len)
)

This can be useful if you don't want to put a long statement inside a len() function.

如果不希望在len()函数中放入长语句,那么这将非常有用。

You could use __len__() instead but __len__() looks a bit weird.

您可以使用__len__(),但是__len__()看起来有点奇怪。

#10


1  

Row count (use any of):

行数(使用任意的):

df.shape[0]
len(df)

#11


0  

For dataframe df, a printed comma formatted row count used while exploring data:

对于dataframe df,在研究数据时使用的打印的逗号格式的行计数:

def nrow(df):
    print("{:,}".format(df.shape[0]))

Example:

例子:

nrow(my_df)
12,456,789

#12


0  

easily one line

轻松一行

your_data _frame.shape

will give you simple number of rows and columns

会给你简单的行数和列数吗

#1


531  

You can use the .shape property or just len(DataFrame.index). However, there are notable performance differences ( the .shape property is faster):

可以使用.shape属性,也可以使用len(DataFrame.index)。但是,有显著的性能差异(.shape属性更快):

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: df = pd.DataFrame(np.arange(12).reshape(4,3))

In [4]: df
Out[4]: 
   0  1  2
0  0  1  2
1  3  4  5
2  6  7  8
3  9  10 11

In [5]: df.shape
Out[5]: (4, 3)

In [6]: timeit df.shape
1000000 loops, best of 3: 1.17 us per loop

In [7]: timeit df[0].count()
10000 loops, best of 3: 56 us per loop

In [8]: len(df.index)
Out[8]: 4

In [9]: timeit len(df.index)
1000000 loops, best of 3: 381 ns per loop

如何获得熊猫数据档案的行数?

EDIT: As @Dan Allen noted in the comments len(df.index) and df[0].count() are not interchangeable as count excludes NaNs,

编辑:正如@Dan Allen在评论len(df.index)和df[0].count()中指出的那样,由于count不包含NaNs,所以不能互换,

#2


93  

suppose df is your dataframe then:

假设df是你的dataframe:

Count_Row=df.shape[0] #gives number of row count
Count_Col=df.shape[1] #gives number of col count

#3


89  

Use len(df). This works as of pandas 0.11 or maybe even earlier.

使用len(df)。这在熊猫0.11或更早的时候起作用。

__len__() is currently (0.12) documented with Returns length of index. Timing info, set up the same way as in root's answer:

__len__()目前用索引的返回长度记录(0.12)。定时信息,设置与root的答案相同:

In [7]: timeit len(df.index)
1000000 loops, best of 3: 248 ns per loop

In [8]: timeit len(df)
1000000 loops, best of 3: 573 ns per loop

Due to one additional function call it is a bit slower than calling len(df.index) directly, but this should not play any role in most use cases.

由于一个额外的函数调用,它比直接调用len(df.index)要慢一些,但是在大多数用例中这不应该起任何作用。

#4


15  

Apart from above answers use can use df.axes to get the tuple with row and column indexes and then use len() function:

除以上答案外,可使用df。轴获取具有行和列索引的元组,然后使用len()函数:

total_rows=len(df.axes[0])
total_cols=len(df.axes[1])

#5


11  

For getting rows use

获取行用

df.index

and for columns use

和列使用

df.columns

You can always use len(func) for getting the count of list, hence you can use len(df.index) for getting the number of rows.

您可以使用len(func)来获得列表的计数,因此您可以使用len(df.index)来获得行数。

But keep in mind, as stated by @root, using shape[0] and shape[1] for getting the number of rows and columns, respectively, is a faster option.

但是请记住,如@root所述,使用形状[0]和形状[1]分别获取行数和列数是一个更快的选择。

#6


6  

I come to pandas from R background, and I see that pandas is more complicated when it comes to selecting row or column. I had to wrestle with it for a while, then I found some ways to deal with:

我从R的背景来研究熊猫,我发现熊猫在选择行或列的时候更加复杂。我不得不和它搏斗一段时间,然后我找到了一些对付的方法:

getting the number of columns:

获取列数:

len(df.columns)  
## Here:
#df is your data.frame
#df.columns return a string, it contains column's titles of the df. 
#Then, "len()" gets the length of it.

getting the number of rows:

获取行数:

len(df.index) #It's similar.

#7


3  

df.shape returns the shape of the data frame in the form of a tuple (no. of rows, no. of cols).

df。shape以元组的形式返回数据帧的形状(no)。行,不。峡路)。

You can simply access no. of rows or no. of cols with df.shape[0] or df.shape[1], respectively, which is same as accessing the values of the tuple.

你可以访问no。行或没有。df的关口。[0]或df形状。分别为形状[1],这与访问元组的值相同。

#8


3  

...building on Jan-Philip Gehrcke's answer.

…基于Jan-Philip Gehrcke的答案。

The reason why len(df) or len(df.index) is faster than df.shape[0]. Look at the code. df.shape is a @property that runs a DataFrame method calling len twice.

为什么len(df)或len(df.index)要比df.shape[0]快。看一下代码。df。shape是一个@property,它运行两次调用len的DataFrame方法。

df.shape??
Type:        property
String form: <property object at 0x1127b33c0>
Source:     
# df.shape.fget
@property
def shape(self):
    """
    Return a tuple representing the dimensionality of the DataFrame.
    """
    return len(self.index), len(self.columns)

And beneath the hood of len(df)

在len(df)的引擎盖下

df.__len__??
Signature: df.__len__()
Source:   
    def __len__(self):
        """Returns length of info axis, but here we use the index """
        return len(self.index)
File:      ~/miniconda2/lib/python2.7/site-packages/pandas/core/frame.py
Type:      instancemethod

len(df.index) will be slightly faster than len(df) since it has one less function call, but this is always faster than df.shape[0]

len(df.index)将比len(df)稍微快一点,因为它有一个更少的函数调用,但是这总是比df.shape[0]快

#9


2  

In case you want to get the row count in the middle of a chained operation, you can use:

如果您想在链式操作中获取行数,可以使用:

df.pipe(len)

Example:

例子:

row_count = (
      pd.DataFrame(np.random.rand(3,4))
      .reset_index()
      .pipe(len)
)

This can be useful if you don't want to put a long statement inside a len() function.

如果不希望在len()函数中放入长语句,那么这将非常有用。

You could use __len__() instead but __len__() looks a bit weird.

您可以使用__len__(),但是__len__()看起来有点奇怪。

#10


1  

Row count (use any of):

行数(使用任意的):

df.shape[0]
len(df)

#11


0  

For dataframe df, a printed comma formatted row count used while exploring data:

对于dataframe df,在研究数据时使用的打印的逗号格式的行计数:

def nrow(df):
    print("{:,}".format(df.shape[0]))

Example:

例子:

nrow(my_df)
12,456,789

#12


0  

easily one line

轻松一行

your_data _frame.shape

will give you simple number of rows and columns

会给你简单的行数和列数吗