拆分文件并将其转换为python中的字典

时间:2022-04-13 22:57:06
counts = dict()

for word in x:                                  # x is file named "f2.txt"
    words = word.split()
    print words
    counts[words] = counts.get(words,0) + 1
print counts 

I want to split a file and then want to print the words which is used maximum times.

我想分割文件,然后想要打印最多使用的单词。

But I am not even able to create dictionary, above code prints empty dictionary {}.

但我甚至无法创建字典,上面的代码打印空字典{}。


P.S. I have not added 1st part of the code, which is for opening file, counting total number of lines and for printing all lines in uppercase.

附:我没有添加代码的第一部分,用于打开文件,计算总行数以及以大写形式打印所有行。

2 个解决方案

#1


You can use collections.Counter() which takes a text as input and returns a dictionary recording the frequency of each word in the file.

您可以使用collections.Counter(),它将文本作为输入并返回一个字典,记录文件中每个单词的频率。

sample.txt:

hello this file is good
file is is good excellent

And the code for reading and recording the frequency of words:

以及读取和记录单词频率的代码:

import collections
with open("sample.txt", "r") as datafile:
    lines = datafile.read()
    words = lines.split()
    words_hist = collections.Counter(words)
    print words_hist

Output:

{'is': 3, 'good': 2, 'file': 2, 'this': 1, 'excellent': 1, 'hello': 1}

As per your posted solution, It seems that, you are incorrectly reading the input file. So I have edited your approach a bit:

根据您发布的解决方案,您似乎错误地读取了输入文件。所以我稍微编辑了你的方法:

counts = dict()

with open("sample.txt", "r") as datafile:
    x = datafile.read().split()
    for word in x:                               
        words = word.split()
        print words
        counts[word] = counts.get(word,0) + 1
print counts

#2


You asked about most common word.I've shown three most common words.

你问过最常见的词。我已经展示了三个最常见的词。

 In [102]: line
    Out[102]: ' Mom   can   I    have  an  ice cream?Mom I Mom Mom'

    In [103]: li=line.split()

    In [104]: li
    Out[104]: ['Mom', 'can', 'I', 'have', 'an', 'ice', 'cream?Mom', 'I', 'Mom', 'Mom']

    In [105]: collections.Counter(li)
    Out[105]: Counter({'Mom': 3, 'I': 2, 'ice': 1, 'an': 1, 'can': 1, 'have': 1, 'cream?Mom': 1})

    In [106]: collections.Counter(li).most_common(3)
    Out[106]: [('Mom', 3), ('I', 2), ('ice', 1)]

#1


You can use collections.Counter() which takes a text as input and returns a dictionary recording the frequency of each word in the file.

您可以使用collections.Counter(),它将文本作为输入并返回一个字典,记录文件中每个单词的频率。

sample.txt:

hello this file is good
file is is good excellent

And the code for reading and recording the frequency of words:

以及读取和记录单词频率的代码:

import collections
with open("sample.txt", "r") as datafile:
    lines = datafile.read()
    words = lines.split()
    words_hist = collections.Counter(words)
    print words_hist

Output:

{'is': 3, 'good': 2, 'file': 2, 'this': 1, 'excellent': 1, 'hello': 1}

As per your posted solution, It seems that, you are incorrectly reading the input file. So I have edited your approach a bit:

根据您发布的解决方案,您似乎错误地读取了输入文件。所以我稍微编辑了你的方法:

counts = dict()

with open("sample.txt", "r") as datafile:
    x = datafile.read().split()
    for word in x:                               
        words = word.split()
        print words
        counts[word] = counts.get(word,0) + 1
print counts

#2


You asked about most common word.I've shown three most common words.

你问过最常见的词。我已经展示了三个最常见的词。

 In [102]: line
    Out[102]: ' Mom   can   I    have  an  ice cream?Mom I Mom Mom'

    In [103]: li=line.split()

    In [104]: li
    Out[104]: ['Mom', 'can', 'I', 'have', 'an', 'ice', 'cream?Mom', 'I', 'Mom', 'Mom']

    In [105]: collections.Counter(li)
    Out[105]: Counter({'Mom': 3, 'I': 2, 'ice': 1, 'an': 1, 'can': 1, 'have': 1, 'cream?Mom': 1})

    In [106]: collections.Counter(li).most_common(3)
    Out[106]: [('Mom', 3), ('I', 2), ('ice', 1)]