如何将Spark Dataframe保存到Hana Vora表?

时间:2022-06-06 16:53:15

We have a file that we want split into 3 and that we need to perform some data cleanup on before it can be imported into Hana Vora - otherwise everything has to be typed as String, which is not ideal.

我们有一个文件,我们想要拆分为3,我们需要先执行一些数据清理才能导入Hana Vora - 否则所有内容都必须输入为String,这并不理想。

We can import and prepare the DataFrames in spark just fine, but then when i try to write to either the HDFS filesystem or, better, to save as a Table in the "com.sap.spark.vora" datasource, i get errors.

我们可以在spark中导入和准备DataFrames,但是当我尝试写入HDFS文件系统时,或者更好的是,在“com.sap.spark.vora”数据源中保存为Table时,我会收到错误。

Can any one advise on a reliable way to import the spark-prepared datasets into Hana Vora? Thanks!

任何人都可以建议以可靠的方式将火花准备的数据集导入Hana Vora吗?谢谢!

1 个解决方案

#1


0  

Vora currently only officially supports appending data to an existing table (using the APPEND statement). For details see SAP HANA Vora Developer Guide -> Chapter "3.5 Appending Data to Existing Tables"

Vora目前仅正式支持将数据附加到现有表(使用APPEND语句)。有关详细信息,请参阅SAP HANA Vora开发人员指南 - >章节“3.5将数据附加到现有表”

This means you would have to create an intermediate file. Vora supports reading from CSV, ORC, Parquet files. A dataframe can be saved in an ORC and Parquet files directly from Spark (see https://spark.apache.org/docs/1.6.1/sql-programming-guide.htm). To write to CSV files from Spark see https://github.com/databricks/spark-csv

这意味着您必须创建一个中间文件。 Vora支持从CSV,ORC,Parquet文件中读取。数据帧可以直接从Spark保存在ORC和Parquet文件中(参见https://spark.apache.org/docs/1.6.1/sql-programming-guide.htm)。要从Spark写入CSV文件,请参阅https://github.com/databricks/spark-csv

#1


0  

Vora currently only officially supports appending data to an existing table (using the APPEND statement). For details see SAP HANA Vora Developer Guide -> Chapter "3.5 Appending Data to Existing Tables"

Vora目前仅正式支持将数据附加到现有表(使用APPEND语句)。有关详细信息,请参阅SAP HANA Vora开发人员指南 - >章节“3.5将数据附加到现有表”

This means you would have to create an intermediate file. Vora supports reading from CSV, ORC, Parquet files. A dataframe can be saved in an ORC and Parquet files directly from Spark (see https://spark.apache.org/docs/1.6.1/sql-programming-guide.htm). To write to CSV files from Spark see https://github.com/databricks/spark-csv

这意味着您必须创建一个中间文件。 Vora支持从CSV,ORC,Parquet文件中读取。数据帧可以直接从Spark保存在ORC和Parquet文件中(参见https://spark.apache.org/docs/1.6.1/sql-programming-guide.htm)。要从Spark写入CSV文件,请参阅https://github.com/databricks/spark-csv