在pandas中同时替换空白和空字段

时间:2022-06-01 18:57:05

I have a dataframe "column" which has blank & NaN (nulls) at the same time. Now I want to replace the blank & NaN field with a string "No Data". Please give some guidance on the same. I am using Python Pandas.

我有一个数据帧“列”,它同时具有空白和NaN(空值)。现在我想用空格和NaN字段替换字符串“No Data”。请给出相同的指导。我正在使用Python Pandas。

My dataframe column -

我的数据框列 -

Col1
----

NaN
New York
NaN

This is what I have tried -

这是我试过的 -

df['Col1'] = df['Col1'].replace(r'\s+', "No Data", regex=True)
df['Col1'] = df['Col1'].replace(np.NaN, "No Data", regex=True)

My resultant column looks like -

我的结果列看起来像 -

Col1
----
No Data
No data
NewNo DataYork
No Data

Thanks.

4 个解决方案

#1


filter the df to set the empty/blank entries to NaN and then fill these:

过滤df以将空/空条目设置为NaN,然后​​填写以下内容:

In [27]:    
​
df = pd.DataFrame({'Col1':['',np.NaN,'New York',np.NaN]})
df
Out[27]:
       Col1
0          
1       NaN
2  New York
3       NaN
In [28]:

df.loc[df['Col1'].str.len() == 0, 'Col1'] = np.NaN
df['Col1'] = df['Col1'].fillna('No Data')
df
Out[28]:
       Col1
0   No Data
1   No Data
2  New York
3   No Data

#2


You have to specify the start and end of the regex:

您必须指定正则表达式的开头和结尾:

In [11]: df.replace('^\s*$', np.nan, regex=True)
Out[11]:
       Col1
0       NaN
1       NaN
2  New York
3       NaN

In [12]: df.replace('^\s*$', np.nan, regex=True).fillna("No Data")
Out[12]:
       Col1
0   No Data
1   No Data
2  New York
3   No Data

#3


You could pass the values you want to replace in a dictionary to the replace function:

您可以将要在字典中替换的值传递给replace函数:

In [944]: x.head()
Out[944]: 
  ind1      ind2  value  identifier
0   EA  01/01/07  0.231          55
1   EA  01/01/07  0.511          56
2   EA  01/01/07  0.357          57
3   EA  01/02/07  0.091          55
4   EA  01/02/07  0.161          57

In [945]: x.head().replace({55:'N/A', 56:'FiftySix'}, axis=1)
Out[945]: 
  ind1      ind2  value identifier
0   EA  01/01/07  0.231        N/A
1   EA  01/01/07  0.511   FiftySix
2   EA  01/01/07  0.357         57
3   EA  01/02/07  0.091        N/A
4   EA  01/02/07  0.161         57

#4


Okay, here's a where-based approach:

好的,这是一个基于位置的方法:

>>> df["Col1"] = df.Col1.where(df.Col1.str.strip().str.len() > 0, "No Data")
>>> df
       Col1
0   No Data
1   No Data
2  New York
3   No Data

This replaces anything which after stripping doesn't have a positive length with "No Data". NaNs stay NaN, and so they don't have a positive length.

这取代了剥离后没有“无数据”的正长度的任何东西。 NaNs保持NaN,所以他们没有正长度。

(I'm really bad at remembering regex syntax so I tend to use named methods instead.)

(我很难记住正则表达式语法,所以我倾向于使用命名方法。)

#1


filter the df to set the empty/blank entries to NaN and then fill these:

过滤df以将空/空条目设置为NaN,然后​​填写以下内容:

In [27]:    
​
df = pd.DataFrame({'Col1':['',np.NaN,'New York',np.NaN]})
df
Out[27]:
       Col1
0          
1       NaN
2  New York
3       NaN
In [28]:

df.loc[df['Col1'].str.len() == 0, 'Col1'] = np.NaN
df['Col1'] = df['Col1'].fillna('No Data')
df
Out[28]:
       Col1
0   No Data
1   No Data
2  New York
3   No Data

#2


You have to specify the start and end of the regex:

您必须指定正则表达式的开头和结尾:

In [11]: df.replace('^\s*$', np.nan, regex=True)
Out[11]:
       Col1
0       NaN
1       NaN
2  New York
3       NaN

In [12]: df.replace('^\s*$', np.nan, regex=True).fillna("No Data")
Out[12]:
       Col1
0   No Data
1   No Data
2  New York
3   No Data

#3


You could pass the values you want to replace in a dictionary to the replace function:

您可以将要在字典中替换的值传递给replace函数:

In [944]: x.head()
Out[944]: 
  ind1      ind2  value  identifier
0   EA  01/01/07  0.231          55
1   EA  01/01/07  0.511          56
2   EA  01/01/07  0.357          57
3   EA  01/02/07  0.091          55
4   EA  01/02/07  0.161          57

In [945]: x.head().replace({55:'N/A', 56:'FiftySix'}, axis=1)
Out[945]: 
  ind1      ind2  value identifier
0   EA  01/01/07  0.231        N/A
1   EA  01/01/07  0.511   FiftySix
2   EA  01/01/07  0.357         57
3   EA  01/02/07  0.091        N/A
4   EA  01/02/07  0.161         57

#4


Okay, here's a where-based approach:

好的,这是一个基于位置的方法:

>>> df["Col1"] = df.Col1.where(df.Col1.str.strip().str.len() > 0, "No Data")
>>> df
       Col1
0   No Data
1   No Data
2  New York
3   No Data

This replaces anything which after stripping doesn't have a positive length with "No Data". NaNs stay NaN, and so they don't have a positive length.

这取代了剥离后没有“无数据”的正长度的任何东西。 NaNs保持NaN,所以他们没有正长度。

(I'm really bad at remembering regex syntax so I tend to use named methods instead.)

(我很难记住正则表达式语法,所以我倾向于使用命名方法。)