Python'yield'语句导致LAMBDA AWS测试用例中的JSON不可序列化错误

时间:2021-07-14 23:06:28

I'm learning how to use Python in the Amazon AWS Lambda service. I'm trying to read characters from an S3 object, and write them to another S3 object. I realize I can copy the S3 object to a local tmp file, but I wanted to "stream" the S3 input into the script, process and output, without the local copy stage if possible. I'm using code from this * (Second answer) that suggests a solution for this.

我正在学习如何在Amazon AWS Lambda服务中使用Python。我正在尝试从S3对象中读取字符,并将它们写入另一个S3对象。我意识到我可以将S3对象复制到本地tmp文件,但是我希望将S3输入“流”化到脚本,处理和输出,如果可能的话没有本地复制阶段。我正在使用*中的代码(第二个答案),为此提出了解决方案。

This code contains two "yield()" statements which are causing my otherwise working script to throw a "generator is noto JSON serializable" error. I'm trying to understand why a "yield()" statement would throw this error. Is this a Lambda environment restriction, or is this something specific to my code that is creating the serialization issue. (Likely due to using an S3 file object?).

此代码包含两个“yield()”语句,这些语句导致我的其他工作脚本抛出“generator is noto JSON serializable”错误。我试图理解为什么“yield()”语句会抛出此错误。这是一个Lambda环境限制,还是这个特定于我的代码创建序列化问题的东西。 (可能是因为使用了S3文件对象?)。

Here is my code that I run in Lambda. If I comment out the two yield statements it runs but the output file is empty.

这是我在Lambda中运行的代码。如果我注释掉它运行的两个yield语句,但输出文件是空的。

from __future__ import print_function

import json
import urllib
import uuid
import boto3
import re

print('Loading IO function')

s3 = boto3.client('s3')


def lambda_handler(event, context):
    print("Received event: " + json.dumps(event, indent=2))

# Get the object from the event and show its content type
inbucket  = event['Records'][0]['s3']['bucket']['name']
outbucket = "outlambda"
inkey     = urllib.unquote_plus(event['Records'][0]['s3']['object']['key'].encode('utf8'))
outkey    = "out" + inkey
try:
    infile = s3.get_object(Bucket=inbucket, Key=inkey)

except Exception as e:
    print(e)
    print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(inkey, bucket))
    raise e

    tmp_path = '/tmp/{}{}'.format(uuid.uuid4(), "tmp.txt")
#   upload_path = '/tmp/resized-{}'.format(key)

    with open(tmp_path,'w') as out:
        unfinished_line = ''
        for byte in infile:
             byte = unfinished_line + byte
             #split on whatever, or use a regex with re.split()
             lines = byte.split('\n')
             unfinished_line = lines.pop()
             for line in lines:
                  out.write(line)
                  yield line          # This line causes JSON error if uncommented
             yield unfinished_line    # This line causes JSON error if uncommented
    #
    # Upload the file to S3
    #
    tmp = open(tmp_path,"r")
    try:
       outfile = s3.put_object(Bucket=outbucket,Key=outkey,Body=tmp)
    except Exception as e:
       print(e)
       print('Error putting object {} from bucket {} Body {}. Make sure they exist and your bucket is in the same region as this function.'.format(outkey, outbucket,"tmp.txt"))
       raise e

    tmp.close()

2 个解决方案

#1


2  

A function includes yield is actually a generator, whereas the lambda handler needs to be a function that optionally returns a json-serializable value.

函数包括yield实际上是一个生成器,而lambda处理程序需要是一个可选地返回json可序列化值的函数。

#2


0  

Thanks to Lei Shi for answering the specific point I was asking about. Also Thanks to FujiApple for pointing out a missed coding mistake in the original code. I was able to develop a solution without using yield that seemed to work copying the input file to output. But with Lei Shi and FujiApples comments I was able to modify that code to create a sub function, called by the lambda handler which could be a generator.

感谢雷石回答我所询问的具体问题。还要感谢FujiApple在原始代码中指出错过的编码错误。我能够开发一个没有使用yield的解决方案,似乎可以将输入文件复制到输出。但是在Lei Shi和FujiApples的评论中,我能够修改该代码来创建一个子函数,由lambda处理程序调用,该处理程序可以是一个生成器。

from __future__ import print_function

import json
import urllib
import uuid
import boto3
import re
print('Loading IO function')

s3 = boto3.client('s3')

def processFile( inbucket,inkey,outbucket,outkey):
    try:
        infile = s3.get_object(Bucket=inbucket, Key=inkey)

    except Exception as e:
        print(e)
        print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(inkey, bucket))
        raise e

    inbody   = infile['Body']
    tmp_path = '/tmp/{}{}'.format(uuid.uuid4(), "tmp.txt")
#   upload_path = '/tmp/resized-{}'.format(key)

    with open(tmp_path,'w') as out:
        unfinished_line = ''
        bytes=inbody.read(4096)
        while( bytes ):
             bytes = unfinished_line + bytes
             #split on whatever, or use a regex with re.split()
             lines = bytes.split('\n')
             print ("bytes %s" % bytes)
             unfinished_line = lines.pop() 
             for line in lines:  
                  print ("line %s" % line)
                  out.write(line)
                  yield line     # if this line is commented out uncomment the     unfinished line if() clause below
             bytes=inbody.read(4096)
#       if(unfinished_line):
#                 out.write(unfinished_line) 
    #
    # Upload the file to S3
    #
    tmp = open(tmp_path,"r")
    try:
       outfile = s3.put_object(Bucket=outbucket,Key=outkey,Body=tmp)
    except Exception as e:
       print(e)
       print('Error putting object {} from bucket {} Body {}. Make sure they exist and your bucket is in the same region as this function.'.format(outkey, outbucket,"tmp.txt"))
       raise e

    tmp.close()

def lambda_handler(event, context):
    print("Received event: " + json.dumps(event, indent=2))

    # Get the object from the event and show its content type
    inbucket  = event['Records'][0]['s3']['bucket']['name']
    outbucket = "outlambda"
    inkey     = urllib.unquote_plus(event['Records'][0]['s3']['object']   ['key'].encode('utf8'))
    outkey    = "out" + inkey

    processFile( inbucket,inkey,outbucket,outkey)

I'm posting the solution which uses yield in a sub "generator" function. Without the "yield" the code misses the last line, which was picked up by the if clause commented out.

我发布了在子“生成器”函数中使用yield的解决方案。如果没有“yield”,代码就会错过最后一行,这句话被if子句注释掉了。

#1


2  

A function includes yield is actually a generator, whereas the lambda handler needs to be a function that optionally returns a json-serializable value.

函数包括yield实际上是一个生成器,而lambda处理程序需要是一个可选地返回json可序列化值的函数。

#2


0  

Thanks to Lei Shi for answering the specific point I was asking about. Also Thanks to FujiApple for pointing out a missed coding mistake in the original code. I was able to develop a solution without using yield that seemed to work copying the input file to output. But with Lei Shi and FujiApples comments I was able to modify that code to create a sub function, called by the lambda handler which could be a generator.

感谢雷石回答我所询问的具体问题。还要感谢FujiApple在原始代码中指出错过的编码错误。我能够开发一个没有使用yield的解决方案,似乎可以将输入文件复制到输出。但是在Lei Shi和FujiApples的评论中,我能够修改该代码来创建一个子函数,由lambda处理程序调用,该处理程序可以是一个生成器。

from __future__ import print_function

import json
import urllib
import uuid
import boto3
import re
print('Loading IO function')

s3 = boto3.client('s3')

def processFile( inbucket,inkey,outbucket,outkey):
    try:
        infile = s3.get_object(Bucket=inbucket, Key=inkey)

    except Exception as e:
        print(e)
        print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(inkey, bucket))
        raise e

    inbody   = infile['Body']
    tmp_path = '/tmp/{}{}'.format(uuid.uuid4(), "tmp.txt")
#   upload_path = '/tmp/resized-{}'.format(key)

    with open(tmp_path,'w') as out:
        unfinished_line = ''
        bytes=inbody.read(4096)
        while( bytes ):
             bytes = unfinished_line + bytes
             #split on whatever, or use a regex with re.split()
             lines = bytes.split('\n')
             print ("bytes %s" % bytes)
             unfinished_line = lines.pop() 
             for line in lines:  
                  print ("line %s" % line)
                  out.write(line)
                  yield line     # if this line is commented out uncomment the     unfinished line if() clause below
             bytes=inbody.read(4096)
#       if(unfinished_line):
#                 out.write(unfinished_line) 
    #
    # Upload the file to S3
    #
    tmp = open(tmp_path,"r")
    try:
       outfile = s3.put_object(Bucket=outbucket,Key=outkey,Body=tmp)
    except Exception as e:
       print(e)
       print('Error putting object {} from bucket {} Body {}. Make sure they exist and your bucket is in the same region as this function.'.format(outkey, outbucket,"tmp.txt"))
       raise e

    tmp.close()

def lambda_handler(event, context):
    print("Received event: " + json.dumps(event, indent=2))

    # Get the object from the event and show its content type
    inbucket  = event['Records'][0]['s3']['bucket']['name']
    outbucket = "outlambda"
    inkey     = urllib.unquote_plus(event['Records'][0]['s3']['object']   ['key'].encode('utf8'))
    outkey    = "out" + inkey

    processFile( inbucket,inkey,outbucket,outkey)

I'm posting the solution which uses yield in a sub "generator" function. Without the "yield" the code misses the last line, which was picked up by the if clause commented out.

我发布了在子“生成器”函数中使用yield的解决方案。如果没有“yield”,代码就会错过最后一行,这句话被if子句注释掉了。