如何从文本文件中选择随机行

I am trying to make a lottery program for my school (we have an economic system).

我正在尝试为我的学校制作一个彩票计划（我们有一个经济系统）。

My program generates numbers and saves it off into a text file. When I want to "pull" numbers out of my generator I want it to ensure that there is a winner.

我的程序生成数字并将其保存到文本文件中。当我想从我的发电机中“拉出”数字时，我希望它确保有一个胜利者。

Q: How do I have Python select a random line out of my text file and give my output as that number?

问：我如何让Python从文本文件中选择一个随机行并将输出作为该数字？

7 个解决方案

#1

How do I have python select a random line out of my text file and give my output as that number?

我怎么让python从我的文本文件中选择一个随机行，并将我的输出作为该数字？

Assuming the file is relatively small, the following is perhaps the easiest way to do it:

假设文件相对较小，以下可能是最简单的方法：

import random
line = random.choice(open('data.txt').readlines())

#2

If the file is very large - you could seek to a random location in the file given the file size and then get the next full line:

如果文件非常大 - 您可以在给定文件大小的文件中寻找随机位置，然后获取下一个完整行：

import os, random 
def get_random_line(file_name):
    total_bytes = os.stat(file_name).st_size 
    random_point = random.randint(0, total_bytes)
    file = open(file_name)
    file.seek(random_point)
    file.readline() # skip this line to clear the partial line
    return file.readline()

#3

def random_line():
    line_num = 0
    selected_line = ''
    with open(filename) as f:
        while 1:
            line = f.readline()
            if not line: break
            line_num += 1
            if random.uniform(0, line_num) < 1:
                selected_line = line
    return selected_line.strip()

Although most of the approaches given here would work, but they tend to load the whole file in the memory at once. But not this approach. So even if the files are big, this would work.

虽然这里给出的大多数方法都有效，但它们倾向于将整个文件一次加载到内存中。但不是这种方法。所以即使文件很大，这也行得通。

The approach is not very intuitive at first glance. The theorem behind this states that when we have seen N lines in there is a probability of exactly 1/N that each of them is selected so far.

乍一看，这种方法不是很直观。这背后的定理指出，当我们看到N行中存在恰好1 / N的概率时，到目前为止它们中的每一个都被选中。

From Page no 123 of 'Python Cookbook'

来自'Python Cookbook'的第123页

#4

With a slight modification to your input file (store the number of items in the first line), you can choose a number uniformly without having to read the entire file into memory first.

稍微修改输入文件（存储第一行中的项目数），您可以统一选择一个数字，而无需先将整个文件读入内存。

import random
def choose_number( frame ):
    with open(fname, "r") as f:
        count = int(f.readline().strip())
        for line in f:
            if not random.randrange(0, count):
                return int(line.strip())
            count-=1

Say you have 100 numbers. The probability of choosing the first number is 1/100. The probability of choosing the second number is (99/100)(1/99) = 1/100. The probability of choosing the third number is (99/100)(98/99)(1/98) = 1/100. I'll skip the formal proof, but the odds of choosing any of the 100 numbers is 1/100.

假设您有100个号码。选择第一个数字的概率是1/100。选择第二个数字的概率是（99/100）（1/99）= 1/100。选择第三个数字的概率是（99/100）（98/99）（1/98）= 1/100。我将跳过正式证明，但选择100个数字中的任何一个的几率是1/100。

It's not strictly necessary to store the count in the first line, but it saves you the trouble of having to read the entire file just to count the lines. Either way, you don't need to store the entire file in memory to choose any single line with equal probability.

在第一行中存储计数并不是绝对必要的，但它可以省去必须读取整个文件以计算行数的麻烦。无论哪种方式，您都不需要将整个文件存储在内存中以选择具有相同概率的任何单行。

#5

Off the top of my head:

脱离我的头顶：

import random
def pick_winner(self):
    lines = []
    with open("file.txt", "r") as f:
        lines = f.readlines();
    random_line_num = random.randrange(0, len(lines))
    return lines[random_lines_num]

#6

another approach:

另一种方法：

import random, fileinput

text = None
for line in fileinput.input('data.txt'):
    if random.randrange(fileinput.lineno()) == 0:
        text = line
print text

Distribution:

分配：

$ seq 1 10 > data.txt

# run for 100000 times
$ ./select.py > out.txt

$ wc -l out.txt 
100000 out.txt

$ sort out.txt | uniq -c
  10066 1
  10004 10
  10023 2
   9979 3
   9926 4
   9936 5
   9878 6
  10023 7
  10154 8
  10011 9

I don't see the skewnes but perhaps the dataset is too small...

我没有看到倾斜，但数据集可能太小了......

#7

-1

I saw a python tutorials and found this snippet:

我看了一个python教程，发现这个片段：

def randomLine(filename):
#Retrieve a  random line from a file, reading through the file once
        fh = open("KEEP-IMPORANT.txt", "r")
        lineNum = 0
        it = ''

        while 1:
                aLine = fh.readline()
                lineNum = lineNum + 1
                if aLine != "":
                        #
                        # How likely is it that this is the last line of the file ? 
                        if random.uniform(0,lineNum)<1:
                                it = aLine
                else:
                        break
        nmsg=it
        return nmsg
        #this is suposed to be a var pull = randomLine(filename)

#1