如何使用pandas将文件中的值添加到字典中

时间:2022-10-29 13:42:38

I have a text file containing integers, e.g.

我有一个包含整数的文本文件,例如

123
456
678

I want do read them and put them in a dict, so I later can easily see if an integer was present, e.g.

我想读取它们并将它们放入dict中,所以我稍后可以很容易地看出是否存在整数,例如:

{456: True, 123: True, 678: True}

What is the most efficient way to achieve this? I am open to not using dict, if there is some other way I can easily lookup values quickly.

实现这一目标的最有效方法是什么?我愿意不使用dict,如果有其他方法我可以很容易地快速查找值。

At the moment I am using pandas like this:

目前我正在使用这样的熊猫:

    df = pd.read_csv(filename, header=None, compression='zip')

    mydict={}

    for index, row in df.iterrows():
        mydict[row[0]] = True

which works, but since the file contain 20 million integers, it takes a while to load it into the dictionary.

哪个有效,但由于该文件包含2000万个整数,因此需要一段时间才能将其加载到字典中。

3 个解决方案

#1


3  

Well this is not a CSV file, so I don't see why you want to parse it as a CSV.

那么这不是一个CSV文件,所以我不明白你为什么要把它解析为CSV。

You can use dictionary comprehension here:

你可以在这里使用字典理解:

with open(filename) as f:
    mydict = {int(l): True for l in f}

#2


3  

A set might be the most convenient data type here:

一个集合可能是这里最方便的数据类型:

myset = set(int(line.strip()) for line in open(filename))

And test if an integer was in the file using in:

并测试文件中是否包含整数:

>>> 123 in myset
Out[]: True

#3


1  

Option 1

You can add a column to the dataframe that has True in all rows, then use zip to generate a dictionary as follows:

您可以向所有行中包含True的数据框添加一列,然后使用zip生成字典,如下所示:

df = pd.read_csv(filename, header=None, compression='zip')
df[1] = True
d = {k: v for k,v in zip(df[0], df[1])}

Option 2

As you are open to suggestions other than using a dictionary, if you already have the dataframe loaded, you can use it to check if an integer is there as follows:

由于您对使用字典以外的建议持开放态度,如果您已经加载了数据框,则可以使用它来检查是否存在整数,如下所示:

>>> df = pd.DataFrame([123,456,678]) 
>>> df
     0
0  123
1  456
2  678
>>> df.values == 123 
array([[ True],
       [False],
       [False]], dtype=bool)
>>> (df.values == 123).any() 
True
>>> 

Then in your conditional logic, you can do something like the following:

然后在条件逻辑中,您可以执行以下操作:

if (df.values == 123).any():  # if 123 is in the dataframe 
   # do something

#1


3  

Well this is not a CSV file, so I don't see why you want to parse it as a CSV.

那么这不是一个CSV文件,所以我不明白你为什么要把它解析为CSV。

You can use dictionary comprehension here:

你可以在这里使用字典理解:

with open(filename) as f:
    mydict = {int(l): True for l in f}

#2


3  

A set might be the most convenient data type here:

一个集合可能是这里最方便的数据类型:

myset = set(int(line.strip()) for line in open(filename))

And test if an integer was in the file using in:

并测试文件中是否包含整数:

>>> 123 in myset
Out[]: True

#3


1  

Option 1

You can add a column to the dataframe that has True in all rows, then use zip to generate a dictionary as follows:

您可以向所有行中包含True的数据框添加一列,然后使用zip生成字典,如下所示:

df = pd.read_csv(filename, header=None, compression='zip')
df[1] = True
d = {k: v for k,v in zip(df[0], df[1])}

Option 2

As you are open to suggestions other than using a dictionary, if you already have the dataframe loaded, you can use it to check if an integer is there as follows:

由于您对使用字典以外的建议持开放态度,如果您已经加载了数据框,则可以使用它来检查是否存在整数,如下所示:

>>> df = pd.DataFrame([123,456,678]) 
>>> df
     0
0  123
1  456
2  678
>>> df.values == 123 
array([[ True],
       [False],
       [False]], dtype=bool)
>>> (df.values == 123).any() 
True
>>> 

Then in your conditional logic, you can do something like the following:

然后在条件逻辑中,您可以执行以下操作:

if (df.values == 123).any():  # if 123 is in the dataframe 
   # do something