如何设置EMR类路径?

时间:2023-01-29 21:12:56

I am running a job on an AWS EMR cluster, and am having issues with a Jackson library conflict. Based on the article here I tried to add a bootstrap step to set my classpath with the following script:

我在AWS EMR集群上运行一项工作,并且在与Jackson图书馆冲突问题上存在问题。在本文的基础上,我尝试添加一个引导步骤,以使用以下脚本设置类路径:

#!/bin/bash
export HADOOP_USER_CLASSPATH_FIRST=true;
echo "HADOOP_CLASSPATH=s3n://bucket/myjar.jar" > /home/hadoop/conf/hadoop-user-env.sh

I have built my jar so that all its dependencies are included with it. The first problem I have when I do this is that my enable debugging step that I have dies with the following error:

我已经构建了我的jar,使它的所有依赖项都包含在其中。当我这样做时,我遇到的第一个问题是,我的启用调试步骤有以下错误:

Exception in thread "main" java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.amazon.ws.emr.hadoop.fs.EmrFileSystem not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1895)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2427)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2440)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:88)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2479)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2461)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:372)
at com.amazon.elasticmapreduce.scriptrunner.ScriptRunner.fetchFile(ScriptRunner.java:39)
at com.amazon.elasticmapreduce.scriptrunner.ScriptRunner.main(ScriptRunner.java:56)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: java.lang.ClassNotFoundException: Class com.amazon.ws.emr.hadoop.fs.EmrFileSystem not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1801)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1893)
... 13 more

So I have two questions, what is wrong with this regards to the enable debugging step also? Is it valid to give my classpath as a s3 location? If not what should the value of:

所以我有两个问题,这对启用调试步骤有什么问题?将我的类路径作为s3位置是否有效?如果不应该是什么:

/path/to/my.jar

be in the example on the page indicated above?

在上面所示的页面的例子中?

1 个解决方案

#1


3  

Looking at your bootstrap action, it looks like there might be a mistake in your string. The line should look like the following:

看看您的引导动作,看起来您的字符串可能有错误。这条线应该如下所示:

#!/bin/bash
export HADOOP_USER_CLASSPATH_FIRST=true
echo "HADOOP_CLASSPATH=/path/to/my.jar" >> /home/hadoop/conf/hadoop-user-env.sh

Note the '>>' characters. A single '>' means that you're replacing the entire file with the output of the 'echo' command, whereas a double '>>' means you're appending that line at the end of the script. Additionally, a semi-colon isn't needed in a Bash script.

注意“> >”字符。一个“>”意味着您将用“echo”命令的输出来替换整个文件,而double '>>'意味着您在脚本的末尾添加了这一行。此外,在Bash脚本中不需要分号。

References : http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hadoop-config_hadoop-user-env.sh.html

引用:http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hadoop-config_hadoop-user-env.sh.html

PS : Amazon's awesome support found this question and replied to my email; although this question was not asked by me. So this is the attribution to the author - AWS Support Engineer named Rendy O.

PS:亚马逊的支持发现了这个问题并回复了我的邮件;虽然我没有问这个问题。这是作者的归属——AWS支持工程师Rendy O。

#1


3  

Looking at your bootstrap action, it looks like there might be a mistake in your string. The line should look like the following:

看看您的引导动作,看起来您的字符串可能有错误。这条线应该如下所示:

#!/bin/bash
export HADOOP_USER_CLASSPATH_FIRST=true
echo "HADOOP_CLASSPATH=/path/to/my.jar" >> /home/hadoop/conf/hadoop-user-env.sh

Note the '>>' characters. A single '>' means that you're replacing the entire file with the output of the 'echo' command, whereas a double '>>' means you're appending that line at the end of the script. Additionally, a semi-colon isn't needed in a Bash script.

注意“> >”字符。一个“>”意味着您将用“echo”命令的输出来替换整个文件,而double '>>'意味着您在脚本的末尾添加了这一行。此外,在Bash脚本中不需要分号。

References : http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hadoop-config_hadoop-user-env.sh.html

引用:http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hadoop-config_hadoop-user-env.sh.html

PS : Amazon's awesome support found this question and replied to my email; although this question was not asked by me. So this is the attribution to the author - AWS Support Engineer named Rendy O.

PS:亚马逊的支持发现了这个问题并回复了我的邮件;虽然我没有问这个问题。这是作者的归属——AWS支持工程师Rendy O。