使用YARN在集群模式下运行spark时出现java.io.FileNotFoundException

时间:2023-02-04 23:11:14

I have a spark application that runs as expected on one node.

我有一个在一个节点上按预期运行的spark应用程序。

I am now using yarn to run this across multiple nodes. However, this is failing with a file not found exception. I first changed this file path from relative to absolute path but the error persisted. I then read here that it may be necessary to prefix the path with file:// in case the default is for HDFS. This file type in question is json.

我现在使用yarn在多个节点上运行它。但是,如果找不到文件异常,则会失败。我首先将此文件路径从相对路径更改为绝对路径,但错误仍然存​​在。然后我在这里读到,如果默认值是HDFS,可能需要在路径前加上file://。有问题的这个文件类型是json。

Despite using the absolute path and prefixing with file, this error persists:

尽管使用绝对路径和前缀文件,但此错误仍然存​​在:

16/11/10 10:19:56 INFO yarn.Client: client token: N/A diagnostics: User class threw exception: java.io.FileNotFoundException: file://absolute/dir/file.json (No such file or directory)

16/11/10 10:19:56 INFO yarn.Client:客户端令牌:N / A诊断:用户类引发异常:java.io.FileNotFoundException:file://absolute/dir/file.json(没有这样的文件或目录)

Why does this work correctly with one node but not in cluster mode with yarn?

为什么这对一个节点正常工作,而不是在带纱线的集群模式下?

1 个解决方案

#1


0  

You're missing a slash /. Try:

你错过了斜线/。尝试:

file:///absolute/dir/file.json

The file:// prefix here specifies the NFS file system, and you need to specify the absolute path from there beginning with a forward slash, requiring three forward slashes in total.

这里的file://前缀指定了NFS文件系统,你需要指定从正斜杠开始的绝对路径,总共需要三个正斜杠。

#1


0  

You're missing a slash /. Try:

你错过了斜线/。尝试:

file:///absolute/dir/file.json

The file:// prefix here specifies the NFS file system, and you need to specify the absolute path from there beginning with a forward slash, requiring three forward slashes in total.

这里的file://前缀指定了NFS文件系统,你需要指定从正斜杠开始的绝对路径,总共需要三个正斜杠。