数据流AvroCoder似乎丢失了对象序列化的类型信息

时间:2022-09-25 12:11:09

I have a custom type which is passed along PCollections and is annotated with @DefaultCoder(AvroCoder.class) - this type contains a few primitives along with a Map<String, Object> which is extracted from a JSON string using a reader.

我有一个自定义类型,它传递给PCollections并使用@DefaultCoder(AvroCoder.class)注释 - 这个类型包含一些基元以及Map ,它是使用阅读器从JSON字符串中提取的。 ,object>

When read initially, the type information from the JSON is maintained and hence, I am able to cast a value to String or any other type as applicable. But in later stages of the pipeline, when I cast the same value to String, I get an exception like java.lang.ClassCastException: java.lang.Object cannot be cast to java.lang.String - this probably means that the type information for each object in the Map has not been passed along in the pipeline. Is this a restriction of AvroCoder - if so, is there a work around or am I doing something wrong? The Map values will be of different types (read from a JSON), including String, int, double - hence I need to continue using an Object type.

最初读取时,会保留JSON中的类型信息,因此,我可以将值转换为String或任何其他类型(如果适用)。但是在管道的后期阶段,当我将相同的值转换为String时,我得到一个像java.lang.ClassCastException的异常:java.lang.Object不能转换为java.lang.String - 这可能意味着类型信息对于Map中的每个对象尚未在管道中传递。这是AvroCoder的限制 - 如果是这样,是否有解决方法或我做错了什么? Map值将是不同的类型(从JSON读取),包括String,int,double - 因此我需要继续使用Object类型。

To verify the coder's behavior, I built a sample program which mimics the behavior and will end up with the ClassCastException as explained above.

为了验证编码器的行为,我构建了一个模拟行为的示例程序,最终会出现如上所述的ClassCastException。

    String filename = "out.avro";
    AvroCoder<Object> coder = AvroCoder.of(TypeDescriptor.of(Object.class));
    FileOutputStream fos = new FileOutputStream(filename);
    ObjectOutputStream oos = new ObjectOutputStream(fos);
    coder.encode("test", oos, new Coder.Context(true));
    oos.close();
    fos.close();

    FileInputStream fis = new FileInputStream(filename);
    ObjectInputStream ois = new ObjectInputStream(fis);
    System.out.println((String) coder.decode(ois, new Coder.Context(true)));

1 个解决方案

#1


1  

TL;DR AvroCoder can only be used with a concrete class.

TL; DR AvroCoder只能用于具体类。

AvroCoder, by nature, uses Avro, which is a schema-based serialization format, rather than a way to serialize arbitrary opaque Java objects.

AvroCoder本质上使用Avro,它是一种基于模式的序列化格式,而不是一种序列化任意不透明Java对象的方法。

AvroCoder uses the fields of the given class as the schema - these are the fields that will be serialized when encoding the data and deserializing when decoding it. You're specifying Object.class, which has no fields.

AvroCoder使用给定类的字段作为模式 - 这些是在对数据进行编码时将序列化并在解码时进行反序列化的字段。您正在指定Object.class,它没有字段。

Likewise, the fields will be set on a new instance of the specified class. So in your case, deserialization creates a new Object, and since the Object class doesn't set any fields, deserialization doesn't attempt to set any fields on this Object, and you end up with a basic empty Object instance.

同样,字段将在指定类的新实例上设置。因此,在您的情况下,反序列化会创建一个新的Object,并且由于Object类不设置任何字段,因此反序列化不会尝试在此Object上设置任何字段,最终会得到一个基本的空Object实例。

For serializing/deserializing arbitrary objects (though they have to implement Serializable), use SerializableCoder.

对于序列化/反序列化任意对象(尽管它们必须实现Serializable),请使用SerializableCoder。

#1


1  

TL;DR AvroCoder can only be used with a concrete class.

TL; DR AvroCoder只能用于具体类。

AvroCoder, by nature, uses Avro, which is a schema-based serialization format, rather than a way to serialize arbitrary opaque Java objects.

AvroCoder本质上使用Avro,它是一种基于模式的序列化格式,而不是一种序列化任意不透明Java对象的方法。

AvroCoder uses the fields of the given class as the schema - these are the fields that will be serialized when encoding the data and deserializing when decoding it. You're specifying Object.class, which has no fields.

AvroCoder使用给定类的字段作为模式 - 这些是在对数据进行编码时将序列化并在解码时进行反序列化的字段。您正在指定Object.class,它没有字段。

Likewise, the fields will be set on a new instance of the specified class. So in your case, deserialization creates a new Object, and since the Object class doesn't set any fields, deserialization doesn't attempt to set any fields on this Object, and you end up with a basic empty Object instance.

同样,字段将在指定类的新实例上设置。因此,在您的情况下,反序列化会创建一个新的Object,并且由于Object类不设置任何字段,因此反序列化不会尝试在此Object上设置任何字段,最终会得到一个基本的空Object实例。

For serializing/deserializing arbitrary objects (though they have to implement Serializable), use SerializableCoder.

对于序列化/反序列化任意对象(尽管它们必须实现Serializable),请使用SerializableCoder。