用分隔符读取csv文件使用熊猫

时间:2023-01-05 08:42:24
def main():
    l=[]
    for i in range(1981,2018):
        df = pd.read_csv("ftp://ftp.cpc.ncep.noaa.gov/htdocs/degree_days/weighted/daily_data/"+ str(i)+"/Population.Heating.txt")
        print(df[12:])

I am trying to download and read the "CONUS" row in Population.Heating.txt from 1981 to 2017. My code seems to get the CONUS parts, but How can I actually read it like csv format with |?

我试图从1981年到2017年下载并阅读Population.Heating.txt中的“CONUS”行。我的代码似乎得到了CONUS部分,但我怎么能像csv格式一样读取它?

Thank you!

2 个解决方案

#1


2  

Try this:

def main():
    l=[]
    url = "ftp://ftp.cpc.ncep.noaa.gov/htdocs/degree_days/weighted/daily_data/{}/Population.Heating.txt"
    for i in range(1981,2018):
        df = pd.read_csv(url.format(i), sep='\|', skiprows=3, engine='python')
        print(df[12:])

Demo:

In [14]: url = "ftp://ftp.cpc.ncep.noaa.gov/htdocs/degree_days/weighted/daily_data/{}/Population.Heating.txt"

In [15]: i = 2017

In [16]: df = pd.read_csv(url.format(i), sep='\|', skiprows=3, engine='python')

In [17]: df
Out[17]:
  Region  20170101  20170102  20170103  20170104  20170105  20170106  20170107  20170108  20170109    ...     20171222  20171223  \
0      1        30        36        31        25        37        39        47        51        55    ...           40        32
1      2        28        32        28        23        39        41        46        49        51    ...           31        25
2      3        34        30        26        43        52        58        57        54        44    ...           29        32
3      4        37        34        37        57        60        62        59        54        43    ...           39        45
4      5        15        11         9        10        20        21        27        36        33    ...           12         7
5      6        16         9         7        22        31        38        45        44        35    ...            9         9
6      7         8         5         9        23        23        34        37        32        17    ...            9        19
7      8        30        32        34        33        36        42        42        31        23    ...           36        33
8      9        25        25        24        23        22        25        23        15        17    ...           23        20
9  CONUS        24        23        21        26        33        38        40        39        34    ...           23        22

   20171224  20171225  20171226  20171227  20171228  20171229  20171230  20171231
0        32        34        43        53        59        59        57        59
1        30        33        43        49        54        53        50        55
2        41        47        58        62        60        54        54        60
3        47        55        61        64        57        54        62        68
4        12        20        21        22        27        26        24        29
5        22        33        31        35        37        33        32        39
6        19        24        23        28        28        23        19        27
7        34        30        32        29        26        24        27        30
8        18        17        17        15        13        11        12        15
9        26        30        34        37        38        35        34        40

[10 rows x 366 columns]

#2


1  

def main():
    l=[]
    for i in range(1981,2018):
        l.append( pd.read_csv("ftp://ftp.cpc.ncep.noaa.gov/htdocs/degree_days/weighted/daily_data/"+ str(i)+"/Population.Heating.txt"
         , sep='|', skiprows=3)) 

Files look like:

文件看起来像:

Product: Daily Heating Degree Days  
Regions: Regions::CensusDivisions  
Weights: Population  
[... data ...] 

so you need to skip 3 rows. Afterwards you have several 'df' in your list 'l' for further processing.

所以你需要跳过3行。然后你的列表'l'中有几个'df'用于进一步处理。

#1


2  

Try this:

def main():
    l=[]
    url = "ftp://ftp.cpc.ncep.noaa.gov/htdocs/degree_days/weighted/daily_data/{}/Population.Heating.txt"
    for i in range(1981,2018):
        df = pd.read_csv(url.format(i), sep='\|', skiprows=3, engine='python')
        print(df[12:])

Demo:

In [14]: url = "ftp://ftp.cpc.ncep.noaa.gov/htdocs/degree_days/weighted/daily_data/{}/Population.Heating.txt"

In [15]: i = 2017

In [16]: df = pd.read_csv(url.format(i), sep='\|', skiprows=3, engine='python')

In [17]: df
Out[17]:
  Region  20170101  20170102  20170103  20170104  20170105  20170106  20170107  20170108  20170109    ...     20171222  20171223  \
0      1        30        36        31        25        37        39        47        51        55    ...           40        32
1      2        28        32        28        23        39        41        46        49        51    ...           31        25
2      3        34        30        26        43        52        58        57        54        44    ...           29        32
3      4        37        34        37        57        60        62        59        54        43    ...           39        45
4      5        15        11         9        10        20        21        27        36        33    ...           12         7
5      6        16         9         7        22        31        38        45        44        35    ...            9         9
6      7         8         5         9        23        23        34        37        32        17    ...            9        19
7      8        30        32        34        33        36        42        42        31        23    ...           36        33
8      9        25        25        24        23        22        25        23        15        17    ...           23        20
9  CONUS        24        23        21        26        33        38        40        39        34    ...           23        22

   20171224  20171225  20171226  20171227  20171228  20171229  20171230  20171231
0        32        34        43        53        59        59        57        59
1        30        33        43        49        54        53        50        55
2        41        47        58        62        60        54        54        60
3        47        55        61        64        57        54        62        68
4        12        20        21        22        27        26        24        29
5        22        33        31        35        37        33        32        39
6        19        24        23        28        28        23        19        27
7        34        30        32        29        26        24        27        30
8        18        17        17        15        13        11        12        15
9        26        30        34        37        38        35        34        40

[10 rows x 366 columns]

#2


1  

def main():
    l=[]
    for i in range(1981,2018):
        l.append( pd.read_csv("ftp://ftp.cpc.ncep.noaa.gov/htdocs/degree_days/weighted/daily_data/"+ str(i)+"/Population.Heating.txt"
         , sep='|', skiprows=3)) 

Files look like:

文件看起来像:

Product: Daily Heating Degree Days  
Regions: Regions::CensusDivisions  
Weights: Population  
[... data ...] 

so you need to skip 3 rows. Afterwards you have several 'df' in your list 'l' for further processing.

所以你需要跳过3行。然后你的列表'l'中有几个'df'用于进一步处理。