如何在Python中迭代空格分隔的ASCII文件

时间:2022-10-14 03:34:53

Strange question here.

奇怪的问题在这里。

I have a .txt file that I want to iterate over. I can get all the words into an array from the file, which is good, but what I want to know how to do is, how do I iterate over the whole file, but not the individual letters, but the words themselves.

我有一个我想迭代的.txt文件。我可以从文件中将所有单词都放到一个数组中,这很好,但我想知道怎么做,我如何迭代整个文件,但不是单个字母,而是单词本身。

I want to be able to go through the array which houses all the text from the file, and basically count all the instances in which a word appears in it.

我希望能够通过包含文件中所有文本的数组,并基本计算出现一个单词的所有实例。

Only problem is I don't know how to write the code for it.

唯一的问题是我不知道如何为它编写代码。

I tried using a for loop, but that just iterates over every single letter, when I want the whole words.

我尝试使用for循环,但是当我想要整个单词时,它只会迭代每一个字母。

4 个解决方案

#1


10  

This code reads the space separated file.txt

此代码读取空格分隔的file.txt

f = open("file.txt", "r")
words = f.read().split()
for w in words:
    print w

#2


3  

file = open("test")
for line in file:
    for word in line.split(" "):
         print word

#3


1  

Untested:

未经测试:

def produce_words(file_):
   for line in file_:
     for word in line.split():
        yield word

def main():
   with open('in.txt', 'r') as file_:
      for word in produce_words(file_):
         print word

#4


1  

If you want to loop over an entire file, then the sensible thing to do is to iterate over the it, taking the lines and splitting them into words. Working line-by-line is best as it means we don't read the entire file into memory first (which, for large files, could take a lot of time or cause us to run out of memory):

如果你想循环遍历整个文件,那么明智的做法是迭代它,取出行并将它们分成单词。逐行工作是最好的,因为这意味着我们不会首先将整个文件读入内存(对于大型文件,可能需要花费大量时间或导致内存耗尽):

with open('in.txt') as input:
    for line in input:
        for word in line.split():
            ...

Note that you could use line.split(" ") if you want to preserve more whitespace, as line.split() will remove all excess whitespace.

请注意,如果要保留更多空格,可以使用line.split(“”),因为line.split()将删除所有多余的空格。

Also note my use of the with statement to open the file, as it's more readable and handles closing the file, even on exceptions.

另请注意我使用with语句来打开文件,因为它更易读并处理关闭文件,即使在异常情况下也是如此。

While this is a good solution, if you are not doing anything within the first loop, it's also a little inefficient. To reduce this to one loop, we can use itertools.chain.from_iterable and a generator expression:

虽然这是一个很好的解决方案,但如果你在第一个循环中没有做任何事情,那么效率也会有点低。要将此减少为一个循环,我们可以使用itertools.chain.from_iterable和生成器表达式:

import itertools
with open('in.txt') as input:
    for word in itertools.chain.from_iterable(line.split() for line in input):
            ...

#1


10  

This code reads the space separated file.txt

此代码读取空格分隔的file.txt

f = open("file.txt", "r")
words = f.read().split()
for w in words:
    print w

#2


3  

file = open("test")
for line in file:
    for word in line.split(" "):
         print word

#3


1  

Untested:

未经测试:

def produce_words(file_):
   for line in file_:
     for word in line.split():
        yield word

def main():
   with open('in.txt', 'r') as file_:
      for word in produce_words(file_):
         print word

#4


1  

If you want to loop over an entire file, then the sensible thing to do is to iterate over the it, taking the lines and splitting them into words. Working line-by-line is best as it means we don't read the entire file into memory first (which, for large files, could take a lot of time or cause us to run out of memory):

如果你想循环遍历整个文件,那么明智的做法是迭代它,取出行并将它们分成单词。逐行工作是最好的,因为这意味着我们不会首先将整个文件读入内存(对于大型文件,可能需要花费大量时间或导致内存耗尽):

with open('in.txt') as input:
    for line in input:
        for word in line.split():
            ...

Note that you could use line.split(" ") if you want to preserve more whitespace, as line.split() will remove all excess whitespace.

请注意,如果要保留更多空格,可以使用line.split(“”),因为line.split()将删除所有多余的空格。

Also note my use of the with statement to open the file, as it's more readable and handles closing the file, even on exceptions.

另请注意我使用with语句来打开文件,因为它更易读并处理关闭文件,即使在异常情况下也是如此。

While this is a good solution, if you are not doing anything within the first loop, it's also a little inefficient. To reduce this to one loop, we can use itertools.chain.from_iterable and a generator expression:

虽然这是一个很好的解决方案,但如果你在第一个循环中没有做任何事情,那么效率也会有点低。要将此减少为一个循环,我们可以使用itertools.chain.from_iterable和生成器表达式:

import itertools
with open('in.txt') as input:
    for word in itertools.chain.from_iterable(line.split() for line in input):
            ...