访问数据框列中值的最佳方法是什么?

时间:2022-11-24 15:51:07

For example I have

比如我有

df=pd.DataFrame({'a':[1,2,3]})
df[df['a']==3].a = 4

This does not assign 4 to where 3 is

这不会将4分配给3

df[df['a']==3] = 4

But this works.

但这很有效。

It confused me on how the assignment works. Appreciate if anyone can give me some references or explanation.

它使我对如何分配工作感到困惑。感谢是否有人可以给我一些参考或解释。

5 个解决方案

#1


3  

You do not want to use the second method. It returns a dataframe subslice and assigns the same value to every single row.

您不想使用第二种方法。它返回一个数据帧子切片,并为每一行分配相同的值。

For example,

例如,

df

   a  b
0  1  4
1  2  3
2  3  6

df[df['a'] == 3]

   a  b
2  3  6

df[df['a']==3] = 3

df

   a  b
0  1  4
1  2  3
2  3  3

The first method does not work because boolean indexing returns a copy of the column (series), which you are trying to assign to, so assignment fails:

第一种方法不起作用,因为布尔索引返回您尝试分配给的列(系列)的副本,因此赋值失败:

df[df['a'] == 3].a = 4
/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pandas/core/generic.py:3110: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self[name] = value

So, your options are using .loc (access by name) or iloc (access by index) based indexing:

因此,您的选项使用基于索引的.loc(按名称访问)或iloc(按索引访问):

df.loc[df.a == 3, 'a'] = 4

df
   a
0  1
1  2
2  4

If you are passing a boolean mask, you cannot use iloc.

如果要传递布尔掩码,则不能使用iloc。

#2


2  

Use .loc with boolean index and column label selection:

使用.loc与布尔索引和列标签选择:

df.loc[df.a == 3,'a'] = 4
print(df)

Output:

输出:

   a
0  1
1  2
2  4

In your method what is happening is that you are slicing your dataframe and pandas is creating a copy and that assignment is happening on the copy of the dataframe and not the original dataframe itself.

在您的方法中,发生的事情是您正在切割数据帧,并且pandas正在创建副本,并且该分配发生在数据帧的副本而不是原始数据帧本身。

#3


1  

Use loc

使用loc

In [1289]: df.loc[df['a']==3, 'a'] = 4

In [1290]: df
Out[1290]:
   a
0  1
1  2
2  4

#4


1  

Or you can do it like this

或者你可以这样做

df['a'] = df['a'].replace(3, 4)

(modified, thanks @COLDSPEED)

(修改,谢谢@COLDSPEED)

#5


0  

you would want to do

你想要做的

df['a'].apply(lambda x: 4 if x ==3 else x)

which would give:

这会给:

0    1
1    2
2    4

#1


3  

You do not want to use the second method. It returns a dataframe subslice and assigns the same value to every single row.

您不想使用第二种方法。它返回一个数据帧子切片,并为每一行分配相同的值。

For example,

例如,

df

   a  b
0  1  4
1  2  3
2  3  6

df[df['a'] == 3]

   a  b
2  3  6

df[df['a']==3] = 3

df

   a  b
0  1  4
1  2  3
2  3  3

The first method does not work because boolean indexing returns a copy of the column (series), which you are trying to assign to, so assignment fails:

第一种方法不起作用,因为布尔索引返回您尝试分配给的列(系列)的副本,因此赋值失败:

df[df['a'] == 3].a = 4
/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pandas/core/generic.py:3110: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self[name] = value

So, your options are using .loc (access by name) or iloc (access by index) based indexing:

因此,您的选项使用基于索引的.loc(按名称访问)或iloc(按索引访问):

df.loc[df.a == 3, 'a'] = 4

df
   a
0  1
1  2
2  4

If you are passing a boolean mask, you cannot use iloc.

如果要传递布尔掩码,则不能使用iloc。

#2


2  

Use .loc with boolean index and column label selection:

使用.loc与布尔索引和列标签选择:

df.loc[df.a == 3,'a'] = 4
print(df)

Output:

输出:

   a
0  1
1  2
2  4

In your method what is happening is that you are slicing your dataframe and pandas is creating a copy and that assignment is happening on the copy of the dataframe and not the original dataframe itself.

在您的方法中,发生的事情是您正在切割数据帧,并且pandas正在创建副本,并且该分配发生在数据帧的副本而不是原始数据帧本身。

#3


1  

Use loc

使用loc

In [1289]: df.loc[df['a']==3, 'a'] = 4

In [1290]: df
Out[1290]:
   a
0  1
1  2
2  4

#4


1  

Or you can do it like this

或者你可以这样做

df['a'] = df['a'].replace(3, 4)

(modified, thanks @COLDSPEED)

(修改,谢谢@COLDSPEED)

#5


0  

you would want to do

你想要做的

df['a'].apply(lambda x: 4 if x ==3 else x)

which would give:

这会给:

0    1
1    2
2    4