Python熊猫:如何将一行移动到Dataframe的第一行?

Given an existing Dataframe that is indexed.

给定一个已被索引的现有数据aframe。

>>> df = pd.DataFrame(np.random.randn(10, 5),columns=['a', 'b', 'c', 'd', 'e'])
>>> df
          a         b         c         d         e
0 -0.131666 -0.315019  0.306728 -0.642224 -0.294562
1  0.769310 -1.277065  0.735549 -0.900214 -1.826320
2 -1.561325 -0.155571  0.544697  0.275880 -0.451564
3  0.612561 -0.540457  2.390871 -2.699741  0.534807
4 -1.504476 -2.113726  0.785208 -1.037256 -0.292959
5  0.467429  1.327839 -1.666649  1.144189  0.322896
6 -0.306556  1.668364  0.036508  0.596452  0.066755
7 -1.689779  1.469891 -0.068087 -1.113231  0.382235
8  0.028250 -2.145618  0.555973 -0.473131 -0.638056
9  0.633408 -0.791857  0.933033  1.485575 -0.021429
>>> df.set_index("a")
                  b         c         d         e
a                                                
-0.131666 -0.315019  0.306728 -0.642224 -0.294562
 0.769310 -1.277065  0.735549 -0.900214 -1.826320
-1.561325 -0.155571  0.544697  0.275880 -0.451564
 0.612561 -0.540457  2.390871 -2.699741  0.534807
-1.504476 -2.113726  0.785208 -1.037256 -0.292959
 0.467429  1.327839 -1.666649  1.144189  0.322896
-0.306556  1.668364  0.036508  0.596452  0.066755
-1.689779  1.469891 -0.068087 -1.113231  0.382235
 0.028250 -2.145618  0.555973 -0.473131 -0.638056
 0.633408 -0.791857  0.933033  1.485575 -0.021429

How to move the 3rd row to the first row?

如何将第三行移到第一行?

That says, expected result:

说,预期的结果:

                  b         c         d         e
a                                                
-1.561325 -0.155571  0.544697  0.275880 -0.451564
-0.131666 -0.315019  0.306728 -0.642224 -0.294562
 0.769310 -1.277065  0.735549 -0.900214 -1.826320
 0.612561 -0.540457  2.390871 -2.699741  0.534807
-1.504476 -2.113726  0.785208 -1.037256 -0.292959
 0.467429  1.327839 -1.666649  1.144189  0.322896
-0.306556  1.668364  0.036508  0.596452  0.066755
-1.689779  1.469891 -0.068087 -1.113231  0.382235
 0.028250 -2.145618  0.555973 -0.473131 -0.638056
 0.633408 -0.791857  0.933033  1.485575 -0.021429

Now the original first row should become the second row.

现在第一行应该变成第二行。

4 个解决方案

#1

Reindexing is probably the optimal solution for putting the rows in any new order in 1 apparent step, except it may require producing a new DataFrame which could be prohibitively large.

驯鹿化可能是在一个明显的步骤中以任何新顺序排列行的最佳解决方案，但是它可能需要生成一个新的数据aframe，这个数据aframe可能非常大。

For example

例如

import pandas as pd

t = pd.read_csv('table.txt',sep='\s+')
t
Out[81]: 
  DG/VD   TYPE State Access Consist Cache sCC   Size Units   Name
0   0/0  RAID1  Optl     RW      No  RWTD   -  1.818    TB    one
1   1/1  RAID1  Optl     RW      No  RWTD   -  1.818    TB    two
2   2/2  RAID1  Optl     RW      No  RWTD   -  1.818    TB  three
3   3/3  RAID1  Optl     RW      No  RWTD   -  1.818    TB   four

t.index
Out[82]: Int64Index([0, 1, 2, 3], dtype='int64')

t2 = t.reindex([2,0,1,3]) # cannot do this in place
t2
Out[93]: 
  DG/VD   TYPE State Access Consist Cache sCC   Size Units   Name
2   2/2  RAID1  Optl     RW      No  RWTD   -  1.818    TB  three
0   0/0  RAID1  Optl     RW      No  RWTD   -  1.818    TB    one
1   1/1  RAID1  Optl     RW      No  RWTD   -  1.818    TB    two
3   3/3  RAID1  Optl     RW      No  RWTD   -  1.818    TB   four

Now the index can be set back to range(4) without reindexing:

现在索引可以回调到范围(4)而不用驯鹿化:

t2.index=range(4)
Out[102]: 
  DG/VD   TYPE State Access Consist Cache sCC   Size Units   Name
0   2/2  RAID1  Optl     RW      No  RWTD   -  1.818    TB  three
1   0/0  RAID1  Optl     RW      No  RWTD   -  1.818    TB    one
2   1/1  RAID1  Optl     RW      No  RWTD   -  1.818    TB    two
3   3/3  RAID1  Optl     RW      No  RWTD   -  1.818    TB   four

It can also be done with 'tuple switching' and row selection as a basic mechanism and without creating a new DataFrame. For example:

它还可以使用“元组切换”和行选择作为基本机制，而无需创建新的DataFrame。例如:

import pandas as pd

t = pd.read_csv('table.txt',sep='\s+')

t.ix[1], t.ix[2] = t.ix[2], t.ix[1]
t.ix[0], t.ix[1] = t.ix[1], t.ix[0]  
t
Out[96]: 
  DG/VD   TYPE State Access Consist Cache sCC   Size Units   Name
0   2/2  RAID1  Optl     RW      No  RWTD   -  1.818    TB  three
1   0/0  RAID1  Optl     RW      No  RWTD   -  1.818    TB    one
2   1/1  RAID1  Optl     RW      No  RWTD   -  1.818    TB    two
3   3/3  RAID1  Optl     RW      No  RWTD   -  1.818    TB   four

Another in place method sets the DataFrame index for the desired ordering so that, for example, the 3rd row gets index 0, etc. and then the DataFrame is sorted in place. It's encapsulated in the following function that assumes the rows are indexed with some range(m) for positive integer m and the DataFrame is simply indexed (no MultiIndex) as in the example provided in the question.

另一种方法是将DataFrame索引设置为所需的排序，例如，第3行获取索引0等等，然后将DataFrame排序。它被封装在下面的函数中，该函数假定为正整数m的一些范围(m)被索引，并且数据aframe被简单地索引(没有多索引)，如问题中提供的示例所示。

def putfirst(n,df):
    if not isinstance(n, int):
        print 'error: 1st arg must be an int'
        return
    if n < 1:
        print 'error: 1st arg must be an int > 0'
        return
    if n == 1:
       print 'nothing to do when first arg == 1'
       return
    if n > len(df):
       print 'error: n exceeds the number of rows in the DataFrame'
       return
    df.index = range(1,n) + [0] + range(n,df.index[-1]+1)
    df.sort(inplace=True)

The arguments of putfirst are n, which is the ordinal position of the row to relocate to the first row position, so that if the 3rd row is to be so relocated then n = 3; and df is the DataFrame containing the row to be relocated.

putfirst的参数是n，这是行重新定位到第一行位置的序号位置，所以如果第三行被重新定位，那么n = 3;df是包含要重新定位的行的DataFrame。

Here is a demo:

这是一个演示:

import pandas as pd

df = pd.DataFrame(np.random.randn(10, 5),columns=['a', 'b', 'c', 'd', 'e'])

df.set_index("a") # ineffective without assignment or inplace=True
Out[182]: 
                  b         c         d         e
a                                                
 1.394072 -1.076742 -0.192466 -0.871188  0.420852
-1.211411 -0.258867 -0.581647 -1.260421  0.464575
-1.070241  0.804223 -0.156736  2.010390 -0.887104
-0.977936 -0.267217  0.483338 -0.400333  0.449880
 0.399594 -0.151575 -2.557934  0.160807  0.076525
-0.297204 -1.294274 -0.885180 -0.187497 -0.493560
-0.115413 -0.350745  0.044697 -0.897756  0.890874
-1.151185 -2.612303  1.141250 -0.867136  0.383583
-0.437030  0.347489 -1.230179  0.571078  0.060061
-0.225524  1.349726  1.350300 -0.386653  0.865990

df
Out[183]: 
          a         b         c         d         e
0  1.394072 -1.076742 -0.192466 -0.871188  0.420852
1 -1.211411 -0.258867 -0.581647 -1.260421  0.464575
2 -1.070241  0.804223 -0.156736  2.010390 -0.887104
3 -0.977936 -0.267217  0.483338 -0.400333  0.449880
4  0.399594 -0.151575 -2.557934  0.160807  0.076525
5 -0.297204 -1.294274 -0.885180 -0.187497 -0.493560
6 -0.115413 -0.350745  0.044697 -0.897756  0.890874
7 -1.151185 -2.612303  1.141250 -0.867136  0.383583
8 -0.437030  0.347489 -1.230179  0.571078  0.060061
9 -0.225524  1.349726  1.350300 -0.386653  0.865990

df.index
Out[184]: Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64')

putfirst(3,df)
df
Out[186]: 
          a         b         c         d         e
0 -1.070241  0.804223 -0.156736  2.010390 -0.887104
1  1.394072 -1.076742 -0.192466 -0.871188  0.420852
2 -1.211411 -0.258867 -0.581647 -1.260421  0.464575
3 -0.977936 -0.267217  0.483338 -0.400333  0.449880
4  0.399594 -0.151575 -2.557934  0.160807  0.076525
5 -0.297204 -1.294274 -0.885180 -0.187497 -0.493560
6 -0.115413 -0.350745  0.044697 -0.897756  0.890874
7 -1.151185 -2.612303  1.141250 -0.867136  0.383583
8 -0.437030  0.347489 -1.230179  0.571078  0.060061
9 -0.225524  1.349726  1.350300 -0.386653  0.865990

#2

To move the third row to the first, you can create an index moving the target row to the first element. I use a conditional list comprehension to join by lists.

要将第三行移动到第一行，可以创建一个索引，将目标行移动到第一个元素。我使用条件列表理解来加入列表。

Then, just use iloc to select the desired index rows.

然后，使用iloc选择所需的索引行。

np.random.seed(0)
df = pd.DataFrame(np.random.randn(5, 3),columns=['a', 'b', 'c'])
>>> df
          a         b         c
0  1.764052  0.400157  0.978738
1  2.240893  1.867558 -0.977278
2  0.950088 -0.151357 -0.103219
3  0.410599  0.144044  1.454274
4  0.761038  0.121675  0.443863

target_row = 2
# Move target row to first element of list.
idx = [target_row] + [i for i in range(len(df)) if i != target_row]

>>> df.iloc[idx]
          a         b         c
2  0.950088 -0.151357 -0.103219
0  1.764052  0.400157  0.978738
1  2.240893  1.867558 -0.977278
3  0.410599  0.144044  1.454274
4  0.761038  0.121675  0.443863

if desired, you can also reset your index.

如果需要，还可以重置索引。

>>> df.iloc[idx].reset_index(drop=True)
          a         b         c
0  0.950088 -0.151357 -0.103219
1  1.764052  0.400157  0.978738
2  2.240893  1.867558 -0.977278
3  0.410599  0.144044  1.454274
4  0.761038  0.121675  0.443863

Alternatively, you can just reindex the list using idx:

或者，您可以使用idx重新索引列表:

>>> df.reindex(idx)
          a         b         c
2  0.950088 -0.151357 -0.103219
0  1.764052  0.400157  0.978738
1  2.240893  1.867558 -0.977278
3  0.410599  0.144044  1.454274
4  0.761038  0.121675  0.443863

#3

This is not elegant, but works so far:

这并不优雅，但到目前为止效果还不错:

>>> df = pd.DataFrame(np.random.randn(10, 5),columns=['a', 'b', 'c', 'd', 'e'])
>>> df
      a         b         c         d         e
0  1.124763 -0.416770  1.347839 -0.944334  0.738686
1 -0.348112  0.786822 -1.161970 -1.645065 -0.075205
2  0.549966  0.357076 -0.880669 -0.187731 -0.221997
3  0.311057 -0.126432 -1.187644  2.151804  0.791835
4 -0.310849  0.753750 -1.087447  0.095884  1.449832
5 -0.272344  0.278788 -0.724369 -0.568442  0.164909
6  0.942927 -0.273203  0.203322  1.099572 -0.505160
7  0.526321  1.665012  0.915676 -1.174497 -2.270662
8 -0.959773  0.921732  1.396364 -1.383112  0.603030
9 -2.802902 -0.572469 -1.599550 -1.305605  0.578198
>>> row = df.ix[0].copy()
>>> row
a    1.124763
b   -0.416770
c    1.347839
d   -0.944334
e    0.738686
Name: 0, dtype: float64
>>> df.ix[0]=df.ix[2]
>>> df.ix[2]=row
>>> df
          a         b         c         d         e
0  0.549966  0.357076 -0.880669 -0.187731 -0.221997
1 -0.348112  0.786822 -1.161970 -1.645065 -0.075205
2  1.124763 -0.416770  1.347839 -0.944334  0.738686
3  0.311057 -0.126432 -1.187644  2.151804  0.791835
4 -0.310849  0.753750 -1.087447  0.095884  1.449832
5 -0.272344  0.278788 -0.724369 -0.568442  0.164909
6  0.942927 -0.273203  0.203322  1.099572 -0.505160
7  0.526321  1.665012  0.915676 -1.174497 -2.270662
8 -0.959773  0.921732  1.396364 -1.383112  0.603030
9 -2.802902 -0.572469 -1.599550 -1.305605  0.578198
>>> df.set_index('a')
                  b         c         d         e
a                                                
 0.549966  0.357076 -0.880669 -0.187731 -0.221997
-0.348112  0.786822 -1.161970 -1.645065 -0.075205
 1.124763 -0.416770  1.347839 -0.944334  0.738686
 0.311057 -0.126432 -1.187644  2.151804  0.791835
-0.310849  0.753750 -1.087447  0.095884  1.449832
-0.272344  0.278788 -0.724369 -0.568442  0.164909
 0.942927 -0.273203  0.203322  1.099572 -0.505160
 0.526321  1.665012  0.915676 -1.174497 -2.270662
-0.959773  0.921732  1.396364 -1.383112  0.603030
-2.802902 -0.572469 -1.599550 -1.305605  0.578198

If that's what you want...

如果那是你想要的……

#4

df = pd.DataFrame(np.random.randn(10, 5), columns=['a', 'b', 'c', 'd', 'e'])

you can simply do the following

您只需执行以下操作

df.reindex([2, 0 ,1] + range(3, len(df)))

or you can do the following

或者您可以执行以下操作

pd.concat([ df.reindex([2, 0, 1]) , df.iloc[3:]])

# this line rearrange the first 3 rows
df.reindex([2, 0, 1])

# slice data from third row 
df.iloc[3:]

# concatenate both results together
pd.concat([ df.reindex([2, 0 ,1]), df.iloc[3:]])

#1