在python中读取文件的前N行。

时间:2022-11-22 21:54:57

We have a large raw data file that we would like to trim to a specified size. I am experienced in .net c#, however would like to do this in python to simplify things and out of interest.

我们有一个很大的原始数据文件,我们希望将其缩减到指定的大小。我在。net c#中很有经验,但是我想用python来简化和不感兴趣。

How would I go about getting the first N lines of a text file in python? Will the OS being used have any effect on the implementation?

我将如何获得python中文本文件的第N行?使用的操作系统会对实现有任何影响吗?

Thanks :)

谢谢:)

13 个解决方案

#1


139  

with open("datafile") as myfile:
    head = [next(myfile) for x in xrange(N)]
print head

Here's another way

这是另一种方式

from itertools import islice
with open("datafile") as myfile:
    head = list(islice(myfile, N))
print head

#2


15  

N=10
f=open("file")
for i in range(N):
    line=f.next().strip()
    print line
f.close()

#3


7  

If you want to read the first lines quickly and you don't care about performance you can use .readlines() which returns list object and then slice the list.

如果您想快速阅读第一行,而不关心性能,则可以使用.readlines()返回list对象,然后将列表切片。

E.g. for the first 5 lines:

例如前5行:

with open("pathofmyfileandfileandname") as myfile:
    firstNlines=myfile.readlines()[0:5] #put here the interval you want

Note: the whole file is read so is not the best from the performance point of view but it is easy to use, fast to write and easy to remember so if you want just perform some one-time calculation is very convenient

注意:整个文件读起来并不是最好的从性能角度看,但是它很容易使用,写得快而且容易记住,所以如果你想要做一些一次性的计算是非常方便的。

print firstNlines

#4


5  

There is no specific method to read number of lines exposed by file object.

没有特定的方法读取由file对象公开的行数。

I guess the easiest way would be following:

我想最简单的方法是

lines =[]
with open(file_name) as f:
    lines.extend(f.readline() for i in xrange(N))

#5


3  

Based on gnibbler top voted answer (Nov 20 '09 at 0:27): this class add head() and tail() method to file object.

基于gnibbler顶部投票的答案(11月20日0:27):这个类在file对象中添加head()和tail()方法。

class File(file):
    def head(self, lines_2find=1):
        self.seek(0)                            #Rewind file
        return [self.next() for x in xrange(lines_2find)]

    def tail(self, lines_2find=1):  
        self.seek(0, 2)                         #go to end of file
        bytes_in_file = self.tell()             
        lines_found, total_bytes_scanned = 0, 0
        while (lines_2find+1 > lines_found and
               bytes_in_file > total_bytes_scanned): 
            byte_block = min(1024, bytes_in_file-total_bytes_scanned)
            self.seek(-(byte_block+total_bytes_scanned), 2)
            total_bytes_scanned += byte_block
            lines_found += self.read(1024).count('\n')
        self.seek(-total_bytes_scanned, 2)
        line_list = list(self.readlines())
        return line_list[-lines_2find:]

Usage:

用法:

f = File('path/to/file', 'r')
f.head(3)
f.tail(3)

#6


3  

Starting at Python 2.6, you can take advantage of more sophisticated functions in the IO base clase. So the top rated answer above can be rewritten as:

从Python 2.6开始,您可以利用IO基础clase中更复杂的功能。所以上面的最高等级答案可以重写为:

    with open("datafile") as myfile:
       head = myfile.readlines(N)
    print head

(You don't have to worry about your file having less than N lines since no StopIteration exception is thrown.)

(您不必担心文件的长度小于N行,因为没有抛出StopIteration异常。)

#7


3  

What I do is to call the N lines using pandas. I think the performance is not the best, but for example if N=1000:

我所做的就是用熊猫来调用N行。我认为性能不是最好的,但举例来说,如果N=1000:

import pandas as pd
yourfile = pd.read('path/to/your/file.csv',nrows=1000)

#8


2  

If you want something that obviously (without looking up esoteric stuff in manuals) works without imports and try/except and works on a fair range of Python 2.x versions (2.2 to 2.6):

如果你想要一些明显的东西(不用在手册中查找深奥的东西),可以在没有导入和尝试的情况下工作,并且可以在相当范围的Python 2中工作。x版本(2.2至2.6):

def headn(file_name, n):
    """Like *x head -N command"""
    result = []
    nlines = 0
    assert n >= 1
    for line in open(file_name):
        result.append(line)
        nlines += 1
        if nlines >= n:
            break
    return result

if __name__ == "__main__":
    import sys
    rval = headn(sys.argv[1], int(sys.argv[2]))
    print rval
    print len(rval)

#9


2  

most convinient way on my own:

我自己的最方便的方式:

LINE_COUNT = 3
print [s for (i, s) in enumerate(open('test.txt')) if i < LINE_COUNT]

Solution based on List Comprehension The function open() supports an iteration interface. The enumerate() covers open() and return tuples (index, item), then we check that we're inside an accepted range (if i < LINE_COUNT) and then simply print the result.

基于列表理解的解决方案,函数open()支持一个迭代接口。枚举()包括open()和返回元组(index, item),然后我们检查我们是否在一个被接受的范围内(如果我< LINE_COUNT),然后简单地打印结果。

Enjoy the Python. ;)

喜欢Python。,)

#10


2  

For first 5 lines, simply do:

对于前5行,只需做:

N=5
with open("data_file", "r") as file:
    for i in range(N):
       print file.next()

#11


0  

If you have a really big file, and assuming you want the output to be a numpy array, using np.genfromtxt will freeze your computer. This is so much better in my experience:

如果您有一个非常大的文件,并且假设您希望输出是一个numpy数组,使用np。genfromtxt会冻结你的电脑。在我的经历中,这是更好的:

def load_big_file(fname,maxrows):
'''only works for well-formed text file of space-separated doubles'''

rows = []  # unknown number of lines, so use list

with open(fname) as f:
    j=0        
    for line in f:
        if j==maxrows:
            break
        else:
            line = [float(s) for s in line.split()]
            rows.append(np.array(line, dtype = np.double))
            j+=1
return np.vstack(rows)  # convert list of vectors to array

#12


0  

#!/usr/bin/python

import subprocess

p = subprocess.Popen(["tail", "-n 3", "passlist"], stdout=subprocess.PIPE)

output, err = p.communicate()

print  output

This Method Worked for me

这个方法对我有效。

#13


0  

The two most intuitive ways of doing this would be:

这两种最直观的做法是:

  1. Iterate on the file line-by-line, and break after N lines.

    对文件逐行进行迭代,并在N行之后中断。

  2. Iterate on the file line-by-line using the next() method N times. (This is essentially just a different syntax for what the top answer does.)

    使用下一个()方法在文件逐行上迭代N次。(这本质上只是一个不同的语法,上面的答案是这样的。)

Here is the code:

这是代码:

# Method 1:
with open("fileName", "r") as f:
    counter = 0
    for line in f:
        print line
        counter += 1
        if counter == N: break

# Method 2:
with open("fileName", "r") as f:
    for i in xrange(N):
        line = f.next()
        print line

The bottom line is, as long as you don't use readlines() or enumerateing the whole file into memory, you have plenty of options.

底线是,只要您不使用readlines()或枚举整个文件到内存中,您就有很多选择。

#1


139  

with open("datafile") as myfile:
    head = [next(myfile) for x in xrange(N)]
print head

Here's another way

这是另一种方式

from itertools import islice
with open("datafile") as myfile:
    head = list(islice(myfile, N))
print head

#2


15  

N=10
f=open("file")
for i in range(N):
    line=f.next().strip()
    print line
f.close()

#3


7  

If you want to read the first lines quickly and you don't care about performance you can use .readlines() which returns list object and then slice the list.

如果您想快速阅读第一行,而不关心性能,则可以使用.readlines()返回list对象,然后将列表切片。

E.g. for the first 5 lines:

例如前5行:

with open("pathofmyfileandfileandname") as myfile:
    firstNlines=myfile.readlines()[0:5] #put here the interval you want

Note: the whole file is read so is not the best from the performance point of view but it is easy to use, fast to write and easy to remember so if you want just perform some one-time calculation is very convenient

注意:整个文件读起来并不是最好的从性能角度看,但是它很容易使用,写得快而且容易记住,所以如果你想要做一些一次性的计算是非常方便的。

print firstNlines

#4


5  

There is no specific method to read number of lines exposed by file object.

没有特定的方法读取由file对象公开的行数。

I guess the easiest way would be following:

我想最简单的方法是

lines =[]
with open(file_name) as f:
    lines.extend(f.readline() for i in xrange(N))

#5


3  

Based on gnibbler top voted answer (Nov 20 '09 at 0:27): this class add head() and tail() method to file object.

基于gnibbler顶部投票的答案(11月20日0:27):这个类在file对象中添加head()和tail()方法。

class File(file):
    def head(self, lines_2find=1):
        self.seek(0)                            #Rewind file
        return [self.next() for x in xrange(lines_2find)]

    def tail(self, lines_2find=1):  
        self.seek(0, 2)                         #go to end of file
        bytes_in_file = self.tell()             
        lines_found, total_bytes_scanned = 0, 0
        while (lines_2find+1 > lines_found and
               bytes_in_file > total_bytes_scanned): 
            byte_block = min(1024, bytes_in_file-total_bytes_scanned)
            self.seek(-(byte_block+total_bytes_scanned), 2)
            total_bytes_scanned += byte_block
            lines_found += self.read(1024).count('\n')
        self.seek(-total_bytes_scanned, 2)
        line_list = list(self.readlines())
        return line_list[-lines_2find:]

Usage:

用法:

f = File('path/to/file', 'r')
f.head(3)
f.tail(3)

#6


3  

Starting at Python 2.6, you can take advantage of more sophisticated functions in the IO base clase. So the top rated answer above can be rewritten as:

从Python 2.6开始,您可以利用IO基础clase中更复杂的功能。所以上面的最高等级答案可以重写为:

    with open("datafile") as myfile:
       head = myfile.readlines(N)
    print head

(You don't have to worry about your file having less than N lines since no StopIteration exception is thrown.)

(您不必担心文件的长度小于N行,因为没有抛出StopIteration异常。)

#7


3  

What I do is to call the N lines using pandas. I think the performance is not the best, but for example if N=1000:

我所做的就是用熊猫来调用N行。我认为性能不是最好的,但举例来说,如果N=1000:

import pandas as pd
yourfile = pd.read('path/to/your/file.csv',nrows=1000)

#8


2  

If you want something that obviously (without looking up esoteric stuff in manuals) works without imports and try/except and works on a fair range of Python 2.x versions (2.2 to 2.6):

如果你想要一些明显的东西(不用在手册中查找深奥的东西),可以在没有导入和尝试的情况下工作,并且可以在相当范围的Python 2中工作。x版本(2.2至2.6):

def headn(file_name, n):
    """Like *x head -N command"""
    result = []
    nlines = 0
    assert n >= 1
    for line in open(file_name):
        result.append(line)
        nlines += 1
        if nlines >= n:
            break
    return result

if __name__ == "__main__":
    import sys
    rval = headn(sys.argv[1], int(sys.argv[2]))
    print rval
    print len(rval)

#9


2  

most convinient way on my own:

我自己的最方便的方式:

LINE_COUNT = 3
print [s for (i, s) in enumerate(open('test.txt')) if i < LINE_COUNT]

Solution based on List Comprehension The function open() supports an iteration interface. The enumerate() covers open() and return tuples (index, item), then we check that we're inside an accepted range (if i < LINE_COUNT) and then simply print the result.

基于列表理解的解决方案,函数open()支持一个迭代接口。枚举()包括open()和返回元组(index, item),然后我们检查我们是否在一个被接受的范围内(如果我< LINE_COUNT),然后简单地打印结果。

Enjoy the Python. ;)

喜欢Python。,)

#10


2  

For first 5 lines, simply do:

对于前5行,只需做:

N=5
with open("data_file", "r") as file:
    for i in range(N):
       print file.next()

#11


0  

If you have a really big file, and assuming you want the output to be a numpy array, using np.genfromtxt will freeze your computer. This is so much better in my experience:

如果您有一个非常大的文件,并且假设您希望输出是一个numpy数组,使用np。genfromtxt会冻结你的电脑。在我的经历中,这是更好的:

def load_big_file(fname,maxrows):
'''only works for well-formed text file of space-separated doubles'''

rows = []  # unknown number of lines, so use list

with open(fname) as f:
    j=0        
    for line in f:
        if j==maxrows:
            break
        else:
            line = [float(s) for s in line.split()]
            rows.append(np.array(line, dtype = np.double))
            j+=1
return np.vstack(rows)  # convert list of vectors to array

#12


0  

#!/usr/bin/python

import subprocess

p = subprocess.Popen(["tail", "-n 3", "passlist"], stdout=subprocess.PIPE)

output, err = p.communicate()

print  output

This Method Worked for me

这个方法对我有效。

#13


0  

The two most intuitive ways of doing this would be:

这两种最直观的做法是:

  1. Iterate on the file line-by-line, and break after N lines.

    对文件逐行进行迭代,并在N行之后中断。

  2. Iterate on the file line-by-line using the next() method N times. (This is essentially just a different syntax for what the top answer does.)

    使用下一个()方法在文件逐行上迭代N次。(这本质上只是一个不同的语法,上面的答案是这样的。)

Here is the code:

这是代码:

# Method 1:
with open("fileName", "r") as f:
    counter = 0
    for line in f:
        print line
        counter += 1
        if counter == N: break

# Method 2:
with open("fileName", "r") as f:
    for i in xrange(N):
        line = f.next()
        print line

The bottom line is, as long as you don't use readlines() or enumerateing the whole file into memory, you have plenty of options.

底线是,只要您不使用readlines()或枚举整个文件到内存中,您就有很多选择。