如何通过多个列过滤熊猫数据

时间:2022-11-11 22:57:28

To filter a dataframe (df) by a single column, if we consider data with male and females we might:

如果我们考虑男性和女性的数据,我们可以用单个列来过滤dataframe (df):

males = df[df[Gender]=='Male']

Question 1 - But what if the data spanned multiple years and i wanted to only see males for 2014?

问题1 -但是如果数据跨越多年,我想2014年只看到男性呢?

In other languages I might do something like:

在其他语言中,我可能会这样做:

if A = "Male" and if B = "2014" then 

(except I want to do this and get a subset of the original dataframe in a new dataframe object)

(除了我想这样做并在一个新的dataframe对象中获取原始dataframe的子集)

Question 2. How do I do this in a loop, and create a dataframe object for each unique sets of year and gender (i.e. a df for: 2013-Male, 2013-Female, 2014-Male, and 2014-Female

问题2。如何在循环中实现这一点,并为每一组唯一的年份和性别创建一个dataframe对象(即df: 2013-Male, 2013-Female, 2014-Male,以及2014-Female)

for y in year:

for g in gender:

df = .....

2 个解决方案

#1


66  

Using & operator, don't forget to wrap the sub-statements with ():

使用和操作符时,不要忘记用()将子语句括起来:

males = df[(df[Gender]=='Male') & (df[Year]==2014)]

To store your dataframes in a dict using a for loop:

使用for循环将您的dataframes存储在命令中:

from collections import defaultdict
dic={}
for g in ['male', 'female']:
  dic[g]=defaultdict(dict)
  for y in [2013, 2014]:
    dic[g][y]=df[(df[Gender]==g) & (df[Year]==y)] #store the DataFrames to a dict of dict

EDIT:

A demo for your getDF:

你的getDF的一个演示:

def getDF(dic, gender, year):
  return dic[gender][year]

print genDF(dic, 'male', 2014)

#2


16  

For more general boolean functions that you would like to use as a filter and that depend on more than one column, you can use:

对于更一般的布尔函数,您希望将其用作过滤器,并依赖于多个列,您可以使用:

df = df[df[['col_1','col_2']].apply(lambda x: f(*x), axis=1)]

where f is a function that is applied to every pair of elements (x1, x2) from col_1 and col_2 and returns True or False depending on any condition you want on (x1, x2).

其中f是一个函数,它应用于col_1和col_2的每一对元素(x1, x2),并根据你想要的任何条件返回True或False (x1, x2)。

#1


66  

Using & operator, don't forget to wrap the sub-statements with ():

使用和操作符时,不要忘记用()将子语句括起来:

males = df[(df[Gender]=='Male') & (df[Year]==2014)]

To store your dataframes in a dict using a for loop:

使用for循环将您的dataframes存储在命令中:

from collections import defaultdict
dic={}
for g in ['male', 'female']:
  dic[g]=defaultdict(dict)
  for y in [2013, 2014]:
    dic[g][y]=df[(df[Gender]==g) & (df[Year]==y)] #store the DataFrames to a dict of dict

EDIT:

A demo for your getDF:

你的getDF的一个演示:

def getDF(dic, gender, year):
  return dic[gender][year]

print genDF(dic, 'male', 2014)

#2


16  

For more general boolean functions that you would like to use as a filter and that depend on more than one column, you can use:

对于更一般的布尔函数,您希望将其用作过滤器,并依赖于多个列,您可以使用:

df = df[df[['col_1','col_2']].apply(lambda x: f(*x), axis=1)]

where f is a function that is applied to every pair of elements (x1, x2) from col_1 and col_2 and returns True or False depending on any condition you want on (x1, x2).

其中f是一个函数,它应用于col_1和col_2的每一对元素(x1, x2),并根据你想要的任何条件返回True或False (x1, x2)。