Spark 1.5.1没有使用hive jdbc 1.2.0

时间:2023-02-04 23:10:56

I am trying to execute hive query using spark 1.5.1 in standalone mode and hive 1.2.0 jdbc version.

我正在尝试使用spark 1.5.1在独立模式和hive 1.2.0 jdbc版本中执行hive查询。

Here is my piece of code:

这是我的一段代码:

private static final String HIVE_DRIVER = "org.apache.hive.jdbc.HiveDriver";
private static final String HIVE_CONNECTION_URL = "jdbc:hive2://localhost:10000/idw";
private static final SparkConf sparkconf = new SparkConf().set("spark.master", "spark://impetus-i0248u:7077").set("spark.app.name", "sparkhivesqltest")
                .set("spark.cores.max", "1").set("spark.executor.memory", "512m");

private static final JavaSparkContext sc = new JavaSparkContext(sparkconf);
private static final SQLContext sqlContext = new SQLContext(sc);
public static void main(String[] args) {                
    //Data source options
    Map<String, String> options = new HashMap<String, String>();
    options.put("driver", HIVE_DRIVER);
    options.put("url", HIVE_CONNECTION_URL);
    options.put("dbtable", "(select * from idw.emp) as employees_name");
    DataFrame jdbcDF =    sqlContext.read().format("jdbc").options(options).load();    
    }

I am getting below error at sqlContext.read().format("jdbc").options(options).load();

在sqlContext.read().format("jdbc").options(选项).load() .load();

Exception in thread "main" java.sql.SQLException: Method not supported at org.apache.hive.jdbc.HiveResultSetMetaData.isSigned(HiveResultSetMetaData.java:143) at

线程“main”java.sql中的异常。SQLException:在org.apache.hive.jdbc.HiveResultSetMetaData.isSigned(HiveResultSetMetaData.java:143)时不支持的方法

org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:135) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.(JDBCRelation.scala:91) at org.apache.spark.sql.execution.datasources.jdbc.DefaultSource.createRelation(DefaultSource.scala:60) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:125) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114)

(在org.apache.s ark.c . executions.c .datasources. jdbc.jdbcrdd . $. resolvetable (JDBCRDD.scala:) .apache.s。

I am running spark 1.5.1 in standalone mode Hadoop version is 2.6 Hive version is 1.2.0

我正在运行spark 1.5.1在独立模式下的Hadoop版本是2.6 Hive版本是1.2.0

Here is the dependency that I have added in java project in pom.xml

这是我在java project中添加的一个依赖项

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.10</artifactId>
    <version>1.5.1</version>
</dependency>

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_2.10</artifactId>
    <version>1.5.1</version>
</dependency>

<dependency>
    <groupId>org.apache.hive</groupId>
    <artifactId>hive-jdbc</artifactId>
    <version>1.2.0</version>
    <exclusions>
    <exclusion>
        <groupId>javax.servlet</groupId>
        <artifactId>servlet-api</artifactId>
    </exclusion>
    </exclusions>
</dependency>

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-core</artifactId>
    <version>1.2.0</version>
</dependency>

Can anyone help me out in this? If somebody has used spark 1.5.1 with hive jdbc, then can you please tell me the compatible version of hive for spark 1.5.1.

有谁能在这件事上帮助我吗?如果有人用过spark 1.5.1和jdbc hive,你能告诉我spark 1.5.1的兼容版本吗?

Thank you in advance..!

谢谢你提前. . !

1 个解决方案

#1


6  

As far as I can tell, you're unfortunately out of luck in terms of using the jdbc connector until it's fixed upstream; the "Method not supported" in this case is not just a version mismatch, but is explicitly not implemented in the hive jdbc library branch-1.2 and even if you look at the hive jdbc master branch or branch-2.0 it's still not implemented:

据我所知,不幸的是,在jdbc连接器被修复之前,您不太可能使用它;在这种情况下,“不支持的方法”不仅是版本不匹配,而且明确地没有在hive jdbc library branch-1.2中实现,即使您查看了hive jdbc master branch或branch-2.0,它仍然没有实现:

public boolean isSigned(int column) throws SQLException {
  throw new SQLException("Method not supported");
}

Looking at the Spark callsite, isSigned is called during resolveTable in Spark 1.5 as well as at master.

查看Spark callsite, isSigned是在Spark 1.5中的resolveTable以及master中调用的。

That said, most likely the real reason this "issue" remains is that when interactive with Hive, you're expected to connect to the Hive metastore directly rather than needing to mess around with jdbc connectors; see the Hive Tables in Spark documentation for how to do this. Essentially, you want to think of Spark as an equal/replacement of Hive rather than being a consumer of Hive.

也就是说,这个“问题”仍然存在的真正原因很可能是,当与Hive进行交互时,您应该直接连接到Hive metastore,而不是需要使用jdbc连接器;请参阅Spark文档中的Hive表,了解如何做到这一点。本质上,你想把Spark看作是一个对等的/替换的蜂巢,而不是一个蜂巢的消费者。

This way, pretty much all you do is add hive-site.xml to your Spark's conf/ directory and make sure the datanucleus jars under lib_managed/jars are available to all Spark executors, and then Spark talks directly to the Hive metastore for schema info and fetches data directly from your HDFS in a way amenable to nicely parallelized RDDs.

这样,你所做的就是添加hive-site。xml到您的Spark的conf/目录,并确保lib_managed/jars下的数据核jar对所有的Spark执行程序都可用,然后Spark直接与Hive metastore进行模式信息对话,并从您的HDFS直接获取数据,以一种能够很好地并行化RDDs的方式。

#1


6  

As far as I can tell, you're unfortunately out of luck in terms of using the jdbc connector until it's fixed upstream; the "Method not supported" in this case is not just a version mismatch, but is explicitly not implemented in the hive jdbc library branch-1.2 and even if you look at the hive jdbc master branch or branch-2.0 it's still not implemented:

据我所知,不幸的是,在jdbc连接器被修复之前,您不太可能使用它;在这种情况下,“不支持的方法”不仅是版本不匹配,而且明确地没有在hive jdbc library branch-1.2中实现,即使您查看了hive jdbc master branch或branch-2.0,它仍然没有实现:

public boolean isSigned(int column) throws SQLException {
  throw new SQLException("Method not supported");
}

Looking at the Spark callsite, isSigned is called during resolveTable in Spark 1.5 as well as at master.

查看Spark callsite, isSigned是在Spark 1.5中的resolveTable以及master中调用的。

That said, most likely the real reason this "issue" remains is that when interactive with Hive, you're expected to connect to the Hive metastore directly rather than needing to mess around with jdbc connectors; see the Hive Tables in Spark documentation for how to do this. Essentially, you want to think of Spark as an equal/replacement of Hive rather than being a consumer of Hive.

也就是说,这个“问题”仍然存在的真正原因很可能是,当与Hive进行交互时,您应该直接连接到Hive metastore,而不是需要使用jdbc连接器;请参阅Spark文档中的Hive表,了解如何做到这一点。本质上,你想把Spark看作是一个对等的/替换的蜂巢,而不是一个蜂巢的消费者。

This way, pretty much all you do is add hive-site.xml to your Spark's conf/ directory and make sure the datanucleus jars under lib_managed/jars are available to all Spark executors, and then Spark talks directly to the Hive metastore for schema info and fetches data directly from your HDFS in a way amenable to nicely parallelized RDDs.

这样,你所做的就是添加hive-site。xml到您的Spark的conf/目录,并确保lib_managed/jars下的数据核jar对所有的Spark执行程序都可用,然后Spark直接与Hive metastore进行模式信息对话,并从您的HDFS直接获取数据,以一种能够很好地并行化RDDs的方式。