python数据分析panda库

时间:2023-03-09 20:41:33
python数据分析panda库

panda内有两种数据结构,Series()和DataFrame()

 >>> a=pd.Series([1,2],index=['a','b'])
>>> a
a 1
b 2
dtype: int64
 >>> b.index
RangeIndex(start=0, stop=2, step=1)
>>> b.values
array(['b', 'a'], dtype=object)
>>> a/2
a 0.5
b 1.0
dtype: float64
>>>

列表切分选择

>>> s[0:3:2]
a 2
c 6
dtype: int64  
  s3=pd.Series(arr)  另一种方式生成series
>>> s3
0 1
1 2
2 3
3 4
dtype: int32
>>> s3=pd.Series(s)
>>> s3
a 2
b 5
c 6
d 3
dtype: int64
>>> s[s>8]
Series([], dtype: int64)
>>> s
a 2
b 5
c 6
d 3
dtype: int64
>>> s[s>3] 找出>3的元素
b 5
c 6
dtype: int64
>>> np.log(s) 对series直接运用函数
a 0.693147
b 1.609438
c 1.791759
d 1.098612
dtype: float64
>>> s.isin([5,6]) 看某些元素是否在series中,boolean值
a False
b True
c True
d False
dtype: bool
>>> s[s.isin([5,6])]
b 5
c 6
dtype: int64
>>> s2=pd.Series([5,2,np.NaN,7,np.NaN])
>>> s2
0 5.0
1 2.0
2 NaN
3 7.0
4 NaN
dtype: float64
>>> s2.isnull()
0 False
1 False
2 True
3 False
4 True
dtype: bool
>>> s2.notnull()
0 True
1 True
2 False
3 True
4 False
dtype: bool
>>> s2[s2.isnull()]
2 NaN
4 NaN
dtype: float64

Frame的使用

 frame2=pd.DataFrame(fram,columns=['name','age'])
>>> frame2
name age
red 1 2
yellow 5 6
blue 9 10
black 13 14
>>> frame2.values
array([[ 1, 2],
[ 5, 6],
[ 9, 10],
[13, 14]])
>>> frame2.index
Index([u'red', u'yellow', u'blue', u'black'], dtype='object')
>>> frame2.columns
Index([u'name', u'age'], dtype='object')
>>> frame2['name']
red 1
yellow 5
blue 9
black 13
Name: name, dtype: int32
>>> frame2.name
red 1
yellow 5
blue 9
black 13
Name: name, dtype: int32
>>> frame2.age
red 2
yellow 6
blue 10
black 14
Name: age, dtype: int32
>>> frame2[index=['red']]
>>> frame2[0:2]
name age
red 1 2
yellow 5 6
>>> frame2['name'][2]
9
 >>> s.idxmin()
'a'
>>> s.idxmax9)
SyntaxError: invalid syntax
>>> s.idxmax()
'c'
>>> s.index.is_unique
True
>>> fram
id name age home
red 0 1 2 3
yellow 4 5 6 7
blue 8 9 10 11
black 12 13 14 15
>>> frame4=fram.drop(['name','age'],axis=1) 删除列
>>> frame4
id home
red 0 3
yellow 4 7
blue 8 11
black 12 15
 >>> f=lambda x:x.max()-x.min()   对frame运用自定义函数
>>> fram.apply(f)
id 12
name 12
age 12
home 12
dtype: int64
>>> fram.apply(f,axis=1)
red 3
yellow 3
blue 3
black 3
dtype: int64
>>> fram.apply(f,axis=0)
id 12
name 12
age 12
home 12
dtype: int64
>>> def f(x):
return pd.Series([x.min(),x.max()],index=['min','max']) >>> fram.apply(f)
id name age home
min 0 1 2 3
max 12 13 14 15

  frame的一些数学统计值

 >>> fram.describe()
id name age home
count 4.000000 4.000000 4.000000 4.000000
mean 6.000000 7.000000 8.000000 9.000000
std 5.163978 5.163978 5.163978 5.163978
min 0.000000 1.000000 2.000000 3.000000
25% 3.000000 4.000000 5.000000 6.000000
50% 6.000000 7.000000 8.000000 9.000000
75% 9.000000 10.000000 11.000000 12.000000
max 12.000000 13.000000 14.000000 15.000000
>>> fram.sum()
id 24
name 28
age 32
home 36
dtype: int64
>>> fram.mean()
id 6.0
name 7.0
age 8.0
home 9.0
dtype: float64
>>> fram.min()
id 0
name 1
age 2
home 3
dtype: int32