你在python代码中使用生成器功能在哪里?

时间:2022-07-14 23:23:40

I have studied generators feature and i think i got it but i would like to understand where i could apply it in my code.

我已经研究了生成器功能,我想我得到了它,但我想了解我可以在我的代码中应用它的位置。

I have in mind the following example i read in "Python essential reference" book:

我想到了以下我在“Python必备参考”一书中读到的例子:

# tail -f
 def tail(f):
  f.seek(0,2) 
  while True:
   line = f.readline() 
   if not line: 
     time.sleep(0.1)
     continue
   yield line

Do you have any other effective example where generators are the best tool for the job like tail -f?

你有没有其他有效的例子,其中发电机是最好的工具,如tail -f?

How often do you use generators feature and in which kind of functionality\part of program do you usually apply it?

您经常使用生成器功能以及通常应用哪种功能\部分程序?

4 个解决方案

#1


6  

I use them a lot when I implement scanners (tokenizers) or when I iterate over data containers.

当我实现扫描程序(tokenizer)或迭代数据容器时,我经常使用它们。

Edit: here is a demo tokenizer I used for a C++ syntax highlight program:

编辑:这是我用于C ++语法高亮程序的演示标记器:

whitespace = ' \t\r\n'
operators = '~!%^&*()-+=[]{};:\'"/?.,<>\\|'

def scan(s):
    "returns a token and a state/token id"
    words = {0:'', 1:'', 2:''} # normal, operator, whitespace
    state = 2 # I pick ws as first state
    for c in s:
        if c in operators:
            if state != 1:
                yield (words[state], state)
                words[state] = ''
            state = 1
            words[state] += c
        elif c in whitespace:
            if state != 2:
                yield (words[state], state)
                words[state] = ''
            state = 2
            words[state] += c
        else:
            if state != 0:
                yield (words[state], state)
                words[state] = ''
            state = 0
            words[state] += c
    yield (words[state], state)

Usage example:

>>> it = scan('foo(); i++')
>>> it.next()
('', 2)
>>> it.next()
('foo', 0)
>>> it.next()
('();', 1)
>>> it.next()
(' ', 2)
>>> it.next()
('i', 0)
>>> it.next()
('++', 1)
>>> 

#2


4  

Whenever your code would either generate an unlimited number of values or more generally if too much memory would be consumed by generating the whole list at first.

每当您的代码生成无限数量的值时,或者更多的是,如果首先生成整个列表将消耗太多内存。

Or if it is likely that you don't iterate over the whole generated list (and the list is very large). I mean there is no point in generating every value first (and waiting for the generation) if it is not used.

或者,如果您可能不会遍历整个生成的列表(并且列表非常大)。我的意思是,如果不使用它,那么首先生成每个值(并等待生成)是没有意义的。

My latest encounter with generators was when I implemented a linear recurrent sequence (LRS) like e.g. the Fibonacci sequence.

我最近遇到的生成器就是我实现了一个线性递归序列(LRS),例如斐波纳契数列。

#3


2  

In all cases where I have algorithms that read anything, I use generators exclusively.

在我拥有读取任何内容的算法的所有情况下,我只使用生成器。

Why?

Layering in filtering, mapping and reduction rules is so much easier in a context of multiple generators.

在多个生成器的上下文中,分层过滤,映射和缩减规则非常容易。

Example:

def discard_blank( source ):
    for line in source:
        if len(line) == 0:
            continue
        yield line

def clean_end( source ):
    for line in source:
        yield line.rstrip()

def split_fields( source ):
    for line in source;
        yield line.split()

def convert_pos( tuple_source, position ):
    for line in tuple_source:
        yield line[:position]+int(line[position])+line[position+1:]

with open('somefile','r') as source:
    data= convert_pos( split_fields( discard_blank( clean_end( source ) ) ), 0 )
    total= 0
    for l in data:
        print l
        total += l[0]
    print total

My preference is to use many small generators so that a small change is not disruptive to the entire process chain.

我的偏好是使用许多小型发电机,这样一个小的变化不会对整个过程链造成破坏。

#4


1  

In general, to separate data aquisition (which might be complicated) from consumption. In particular:

通常,将数据采集(可能很复杂)与消费分开。尤其是:

  • to concatenate results of several b-tree queries - the db part generates and executes the queries yield-ing records from each one, the consumer only sees single data items arriving.
  • 连接几个b树查询的结果 - 数据库部分生成并执行查询从每个查询生成记录,消费者只看到到达的单个数据项。

  • buffering (read-ahead ) - the generator fetches data in blocks and yields single elements from each block. Again, the consumer is separated from the gory details.
  • 缓冲(预读) - 生成器以块的形式获取数据并从每个块中生成单个元素。同样,消费者与血腥细节分开。

Generators can also work as coroutines. You can pass data into them using nextval=g.next(data) on the 'consumer' side and data = yield(nextval) on the generator side. In this case the generator and its consumer 'swap' values. You can even make yield throw an exception within the generator context: g.throw(exc) does that.

生成器也可以作为协同程序。您可以使用“consumer”侧的nextval = g.next(data)和生成器端的data = yield(nextval)将数据传递给它们。在这种情况下,生成器及其消费者'交换'值。你甚至可以让yield在生成器上下文中抛出一个异常:g.throw(exc)就是这样。

#1


6  

I use them a lot when I implement scanners (tokenizers) or when I iterate over data containers.

当我实现扫描程序(tokenizer)或迭代数据容器时,我经常使用它们。

Edit: here is a demo tokenizer I used for a C++ syntax highlight program:

编辑:这是我用于C ++语法高亮程序的演示标记器:

whitespace = ' \t\r\n'
operators = '~!%^&*()-+=[]{};:\'"/?.,<>\\|'

def scan(s):
    "returns a token and a state/token id"
    words = {0:'', 1:'', 2:''} # normal, operator, whitespace
    state = 2 # I pick ws as first state
    for c in s:
        if c in operators:
            if state != 1:
                yield (words[state], state)
                words[state] = ''
            state = 1
            words[state] += c
        elif c in whitespace:
            if state != 2:
                yield (words[state], state)
                words[state] = ''
            state = 2
            words[state] += c
        else:
            if state != 0:
                yield (words[state], state)
                words[state] = ''
            state = 0
            words[state] += c
    yield (words[state], state)

Usage example:

>>> it = scan('foo(); i++')
>>> it.next()
('', 2)
>>> it.next()
('foo', 0)
>>> it.next()
('();', 1)
>>> it.next()
(' ', 2)
>>> it.next()
('i', 0)
>>> it.next()
('++', 1)
>>> 

#2


4  

Whenever your code would either generate an unlimited number of values or more generally if too much memory would be consumed by generating the whole list at first.

每当您的代码生成无限数量的值时,或者更多的是,如果首先生成整个列表将消耗太多内存。

Or if it is likely that you don't iterate over the whole generated list (and the list is very large). I mean there is no point in generating every value first (and waiting for the generation) if it is not used.

或者,如果您可能不会遍历整个生成的列表(并且列表非常大)。我的意思是,如果不使用它,那么首先生成每个值(并等待生成)是没有意义的。

My latest encounter with generators was when I implemented a linear recurrent sequence (LRS) like e.g. the Fibonacci sequence.

我最近遇到的生成器就是我实现了一个线性递归序列(LRS),例如斐波纳契数列。

#3


2  

In all cases where I have algorithms that read anything, I use generators exclusively.

在我拥有读取任何内容的算法的所有情况下,我只使用生成器。

Why?

Layering in filtering, mapping and reduction rules is so much easier in a context of multiple generators.

在多个生成器的上下文中,分层过滤,映射和缩减规则非常容易。

Example:

def discard_blank( source ):
    for line in source:
        if len(line) == 0:
            continue
        yield line

def clean_end( source ):
    for line in source:
        yield line.rstrip()

def split_fields( source ):
    for line in source;
        yield line.split()

def convert_pos( tuple_source, position ):
    for line in tuple_source:
        yield line[:position]+int(line[position])+line[position+1:]

with open('somefile','r') as source:
    data= convert_pos( split_fields( discard_blank( clean_end( source ) ) ), 0 )
    total= 0
    for l in data:
        print l
        total += l[0]
    print total

My preference is to use many small generators so that a small change is not disruptive to the entire process chain.

我的偏好是使用许多小型发电机,这样一个小的变化不会对整个过程链造成破坏。

#4


1  

In general, to separate data aquisition (which might be complicated) from consumption. In particular:

通常,将数据采集(可能很复杂)与消费分开。尤其是:

  • to concatenate results of several b-tree queries - the db part generates and executes the queries yield-ing records from each one, the consumer only sees single data items arriving.
  • 连接几个b树查询的结果 - 数据库部分生成并执行查询从每个查询生成记录,消费者只看到到达的单个数据项。

  • buffering (read-ahead ) - the generator fetches data in blocks and yields single elements from each block. Again, the consumer is separated from the gory details.
  • 缓冲(预读) - 生成器以块的形式获取数据并从每个块中生成单个元素。同样,消费者与血腥细节分开。

Generators can also work as coroutines. You can pass data into them using nextval=g.next(data) on the 'consumer' side and data = yield(nextval) on the generator side. In this case the generator and its consumer 'swap' values. You can even make yield throw an exception within the generator context: g.throw(exc) does that.

生成器也可以作为协同程序。您可以使用“consumer”侧的nextval = g.next(data)和生成器端的data = yield(nextval)将数据传递给它们。在这种情况下,生成器及其消费者'交换'值。你甚至可以让yield在生成器上下文中抛出一个异常:g.throw(exc)就是这样。