如何将所有的NaN值替换为一个熊猫数据模型的列中的0

时间:2022-07-18 08:03:09

I have a dataframe as below

我有如下的数据aframe

      itm Date                  Amount 
67    420 2012-09-30 00:00:00   65211
68    421 2012-09-09 00:00:00   29424
69    421 2012-09-16 00:00:00   29877
70    421 2012-09-23 00:00:00   30990
71    421 2012-09-30 00:00:00   61303
72    485 2012-09-09 00:00:00   71781
73    485 2012-09-16 00:00:00     NaN
74    485 2012-09-23 00:00:00   11072
75    485 2012-09-30 00:00:00  113702
76    489 2012-09-09 00:00:00   64731
77    489 2012-09-16 00:00:00     NaN

when I try to .apply a function to the Amount column I get the following error.

当我尝试对Amount列应用一个函数时,我得到以下错误。

ValueError: cannot convert float NaN to integer

I have tried applying a function using .isnan from the Math Module I have tried the pandas .replace attribute I tried the .sparse data attribute from pandas 0.9 I have also tried if NaN == NaN statement in a function. I have also looked at this article How do I replace NA values with zeros in an R dataframe? whilst looking at some other articles. All the methods I have tried have not worked or do not recognise NaN. Any Hints or solutions would be appreciated.

我尝试过使用数学模块中的.isnan函数来应用。replace属性我尝试过熊猫0.9中的.稀疏数据属性,我也尝试过在函数中使用if NaN == NaN语句。我还研究了这篇文章,如何在R dataframe中将NA值替换为0 ?在看其他文章的时候。我尝试过的所有方法都没有起作用,也不认识NaN。如有任何提示或解决办法,我们将不胜感激。

8 个解决方案

#1


380  

I believe DataFrame.fillna() will do this for you.

我相信DataFrame.fillna()将为您实现。

Link to Docs for a dataframe and for a Series.

链接到文档以获取数据aframe和系列。

Example:

例子:

In [7]: df
Out[7]: 
          0         1
0       NaN       NaN
1 -0.494375  0.570994
2       NaN       NaN
3  1.876360 -0.229738
4       NaN       NaN

In [8]: df.fillna(0)
Out[8]: 
          0         1
0  0.000000  0.000000
1 -0.494375  0.570994
2  0.000000  0.000000
3  1.876360 -0.229738
4  0.000000  0.000000

To fill the NaNs in only one column, select just that column. in this case I'm using inplace=True to actually change the contents of df.

若要仅在一列中填充nan,只需选择该列。在本例中,我使用inplace=True来实际更改df的内容。

In [12]: df[1].fillna(0, inplace=True)
Out[12]: 
0    0.000000
1    0.570994
2    0.000000
3   -0.229738
4    0.000000
Name: 1

In [13]: df
Out[13]: 
          0         1
0       NaN  0.000000
1 -0.494375  0.570994
2       NaN  0.000000
3  1.876360 -0.229738
4       NaN  0.000000

#2


49  

It is not guaranteed that the slicing returns a view or a copy. You can do

不能保证切片返回视图或副本。你可以做

df['column']=df['column'].fillna(value)

#3


17  

I just wanted to provide a bit of an update/special case since it looks like people still come here. If you're using a multi-index or otherwise using an index-slicer the inplace=True option may not be enough to update the slice you've chosen. For example in a 2x2 level multi-index this will not change any values (as of pandas 0.15):

我只是想提供一些更新/特殊情况,因为看起来人们仍然在这里。如果使用多索引或使用索引切片器,inplace=True选项可能不足以更新您所选择的切片。例如,在2x2级别的多索引中,这不会改变任何值(如熊猫0.15):

idx = pd.IndexSlice
df.loc[idx[:,mask_1],idx[mask_2,:]].fillna(value=0,inplace=True)

The "problem" is that the chaining breaks the fillna ability to update the original dataframe. I put "problem" in quotes because there are good reasons for the design decisions that led to not interpreting through these chains in certain situations. Also, this is a complex example (though I really ran into it), but the same may apply to fewer levels of indexes depending on how you slice.

“问题”是链接破坏了fillna更新原始数据aframe的能力。我把“问题”放在引号中,因为在某些情况下,设计决策有很好的理由导致不能通过这些链进行解释。此外,这是一个复杂的示例(尽管我确实遇到过),但这同样适用于更少的索引级别,这取决于您如何切片。

The solution is DataFrame.update:

解决方案是DataFrame.update:

df.update(df.loc[idx[:,mask_1],idx[[mask_2],:]].fillna(value=0))

It's one line, reads reasonably well (sort of) and eliminates any unnecessary messing with intermediate variables or loops while allowing you to apply fillna to any multi-level slice you like!

它是一行,读起来相当不错(某种程度上),消除了中间变量或循环中不必要的混乱,同时允许您将fillna应用到您喜欢的任何多级片!

If anybody can find places this doesn't work please post in the comments, I've been messing with it and looking at the source and it seems to solve at least my multi-index slice problems.

如果有人能找到这个不能用的地方,请在评论中留言,我一直在处理它,并查看源代码,它似乎至少解决了我的多索引切片问题。

#4


13  

The below code worked for me.

下面的代码对我有用。

import pandas

df = pandas.read_csv('somefile.txt')

df = df.fillna(0)

#5


10  

You could use replace to change NaN to 0:

可以用replace将NaN改为0:

import pandas as pd
import numpy as np

# for column
df['column'] = df['column'].replace(np.nan, 0)

# for whole dataframe
df = df.replace(np.nan, 0)

# inplace
df.replace(np.nan, 0, inplace=True)

#6


7  

fillna() is the best way to do it. Code -

fillna()是最好的方法。代码,

#fill all Nan value with zero
df = df.fillna(0)

You can also use inplace if you don't want to use 'df = df.fillna(value)' . Code -

如果不想使用“df = df.fillna(value)”,也可以使用inplace。代码,

df.fillna(0, inplace=True)

#7


2  

You should use fillna() . It works for me.

您应该使用fillna()。它适合我。

df = df.fillna(value_to_replace_null)

#8


1  

The only problem is df.fill.na() does not work if the data frame on which you are applying it is resampled or have been sliced through loc function

唯一的问题是,如果应用它的数据帧被重新采样或被loc函数分割,那么df.fill.na()就不能工作

#1


380  

I believe DataFrame.fillna() will do this for you.

我相信DataFrame.fillna()将为您实现。

Link to Docs for a dataframe and for a Series.

链接到文档以获取数据aframe和系列。

Example:

例子:

In [7]: df
Out[7]: 
          0         1
0       NaN       NaN
1 -0.494375  0.570994
2       NaN       NaN
3  1.876360 -0.229738
4       NaN       NaN

In [8]: df.fillna(0)
Out[8]: 
          0         1
0  0.000000  0.000000
1 -0.494375  0.570994
2  0.000000  0.000000
3  1.876360 -0.229738
4  0.000000  0.000000

To fill the NaNs in only one column, select just that column. in this case I'm using inplace=True to actually change the contents of df.

若要仅在一列中填充nan,只需选择该列。在本例中,我使用inplace=True来实际更改df的内容。

In [12]: df[1].fillna(0, inplace=True)
Out[12]: 
0    0.000000
1    0.570994
2    0.000000
3   -0.229738
4    0.000000
Name: 1

In [13]: df
Out[13]: 
          0         1
0       NaN  0.000000
1 -0.494375  0.570994
2       NaN  0.000000
3  1.876360 -0.229738
4       NaN  0.000000

#2


49  

It is not guaranteed that the slicing returns a view or a copy. You can do

不能保证切片返回视图或副本。你可以做

df['column']=df['column'].fillna(value)

#3


17  

I just wanted to provide a bit of an update/special case since it looks like people still come here. If you're using a multi-index or otherwise using an index-slicer the inplace=True option may not be enough to update the slice you've chosen. For example in a 2x2 level multi-index this will not change any values (as of pandas 0.15):

我只是想提供一些更新/特殊情况,因为看起来人们仍然在这里。如果使用多索引或使用索引切片器,inplace=True选项可能不足以更新您所选择的切片。例如,在2x2级别的多索引中,这不会改变任何值(如熊猫0.15):

idx = pd.IndexSlice
df.loc[idx[:,mask_1],idx[mask_2,:]].fillna(value=0,inplace=True)

The "problem" is that the chaining breaks the fillna ability to update the original dataframe. I put "problem" in quotes because there are good reasons for the design decisions that led to not interpreting through these chains in certain situations. Also, this is a complex example (though I really ran into it), but the same may apply to fewer levels of indexes depending on how you slice.

“问题”是链接破坏了fillna更新原始数据aframe的能力。我把“问题”放在引号中,因为在某些情况下,设计决策有很好的理由导致不能通过这些链进行解释。此外,这是一个复杂的示例(尽管我确实遇到过),但这同样适用于更少的索引级别,这取决于您如何切片。

The solution is DataFrame.update:

解决方案是DataFrame.update:

df.update(df.loc[idx[:,mask_1],idx[[mask_2],:]].fillna(value=0))

It's one line, reads reasonably well (sort of) and eliminates any unnecessary messing with intermediate variables or loops while allowing you to apply fillna to any multi-level slice you like!

它是一行,读起来相当不错(某种程度上),消除了中间变量或循环中不必要的混乱,同时允许您将fillna应用到您喜欢的任何多级片!

If anybody can find places this doesn't work please post in the comments, I've been messing with it and looking at the source and it seems to solve at least my multi-index slice problems.

如果有人能找到这个不能用的地方,请在评论中留言,我一直在处理它,并查看源代码,它似乎至少解决了我的多索引切片问题。

#4


13  

The below code worked for me.

下面的代码对我有用。

import pandas

df = pandas.read_csv('somefile.txt')

df = df.fillna(0)

#5


10  

You could use replace to change NaN to 0:

可以用replace将NaN改为0:

import pandas as pd
import numpy as np

# for column
df['column'] = df['column'].replace(np.nan, 0)

# for whole dataframe
df = df.replace(np.nan, 0)

# inplace
df.replace(np.nan, 0, inplace=True)

#6


7  

fillna() is the best way to do it. Code -

fillna()是最好的方法。代码,

#fill all Nan value with zero
df = df.fillna(0)

You can also use inplace if you don't want to use 'df = df.fillna(value)' . Code -

如果不想使用“df = df.fillna(value)”,也可以使用inplace。代码,

df.fillna(0, inplace=True)

#7


2  

You should use fillna() . It works for me.

您应该使用fillna()。它适合我。

df = df.fillna(value_to_replace_null)

#8


1  

The only problem is df.fill.na() does not work if the data frame on which you are applying it is resampled or have been sliced through loc function

唯一的问题是,如果应用它的数据帧被重新采样或被loc函数分割,那么df.fill.na()就不能工作