无法在RStudio中启动SparkR

时间:2023-01-21 00:18:49

After long and difficult installation process of SparkR i getting into new problems of launching SparkR.

经过长时间困难的安装过程,我进入了新的问题发射SparkR。

My Settings

我的设置

R 3.2.0    
RStudio 0.98.1103    
Rtools 3.3    
Spark 1.4.0
Java Version 8
SparkR 1.4.0
Windows 7 SP 1  64 Bit

Now i try to use following code in R:

现在我尝试在R中使用以下代码:

library(devtools)
library(SparkR)
Sys.setenv(SPARK_MEM="1g")
Sys.setenv(SPARK_HOME="C:/spark-1.4.0")
sc <- sparkR.init(master="local")

I recieve following:

我收到以下:

JVM is not ready after 10 seconds

I was also trying to add some system variables like spark path or java path.

我还尝试添加一些系统变量,比如spark path或java path。

Do you have any advices for me to fix that problems.

你对我解决那个问题有什么建议吗?

The next step for me after testing local host would be to start tests on my running hadoop cluster.

在测试本地主机之后,我的下一步是在运行的hadoop集群上启动测试。

7 个解决方案

#1


5  

I think it was a bug that has now been resolved. Try the following,

我认为这个错误现在已经解决了。尝试以下,

Sys.setenv(SPARK_HOME="C:\\spark-1.4.0")

.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))

library("SparkR", lib.loc="C:\\spark-1.4.0\\lib") # The use of \\ is for windows environment.

library(SparkR)

sc=sparkR.init(master="local")

Launching java with spark-submit command C:\spark-1.4.0/bin/spark-submit.cmd sparkr-shell

用spark-submit命令C启动java:\spark-1.4.0/bin/spark-submit。cmd sparkr-shell

C:\Users\Ashish\AppData\Local\Temp\RtmpWqFsOB\backend_portbdc329477c6

C:\Users\Ashish\AppData\Local\Temp\RtmpWqFsOB\ backend_portbdc329477c6

Hope this helps.

希望这个有帮助。

#2


2  

I had the same issue and my spark-submit.cmd file was also not executing from the command line. Following steps worked for me

我有同样的问题,也有我的spark-submit。cmd文件也没有从命令行执行。以下步骤对我很有效

Go to your environment variables and in the system variables select variable name PATH. Along with other values add c:/Windows/System32/ separated by a semicolon. This made my spark-submit.cmd run from command line and eventually from the Rstudio.

转到您的环境变量,在系统变量中选择变量名路径。与其他值一起添加c:/Windows/System32/由分号分隔。这使得我的spark-submit。cmd从命令行运行,最终从Rstudio运行。

I have realized that we get the above issue only if all the required path values are not specified. Ensure all your path values(R, Rtools) are specified in the environment variables. For instance my Rtools path was c:\Rtools\bin;c:\Rtools\gcc-4.6.3\bin

我已经意识到,只有当没有指定所有必需的路径值时,才会出现上述问题。确保在环境变量中指定了所有路径值(R、Rtools)。例如,我的Rtools path是c:\Rtools\bin;c:\Rtools\gcc-4.6.3\bin

I hope this helps.

我希望这可以帮助。

#3


1  

That didn't work for me. If anyone has the same problem, try to give execute permissions to c:/sparkpath/bin/spark-submit.cmd.

这对我不起作用。如果有人有同样的问题,试着给c:/sparkpath/bin/spark-submit.cmd执行权限。

#4


0  

I had exact same issue. I can start SparkR in command line, but not in RStudio in Windows. And here is the solution works for me.

我也有同样的问题。我可以在命令行启动SparkR,但不能在Windows的RStudio中启动。这就是我的解。

  1. clean up all the paths you set when you tried to fix this issue. This including the paths you set in the windows environment from window control panel and uses Sys.unsetenv() to unset the SPARK_HOME.

    清理试图修复此问题时设置的所有路径。这包括您在windows环境中从窗口控制面板设置的路径,并使用Sys.unsetenv()来取消SPARK_HOME的设置。

  2. find out your RStudio default working directory by using getwd() in RStudio. And then create a .Rprofile file in this directory. Put the following line in this file: .libPaths("C:/Apache/Spark-1.5.1/R/lib")

    在RStudio中使用getwd()查找RStudio默认工作目录。然后在这个目录中创建一个. rprofile文件。在这个文件中放置如下一行:.libPaths(“C:/Apache/Spark-1.5.1/R/lib”)

  3. In window control panel->System->Advanced system settings->Environment Variables, add this ";C:\Apache\Spark-1.5.1\bin" at the end of your exsiting PATH variable.

    在窗口控制面板—>系统—>高级系统设置—>环境变量中,在exsiting PATH变量的末尾添加“C:\Apache\Spark-1.5.1\bin”。

  4. Start RStudio, if you type .libPaths(), you can see the SparkR library path is already in the library path

    启动RStudio,如果您输入.libPaths(),您可以看到SparkR库路径已经在库路径中

  5. use library(SparkR) to load SparkR library

    使用库(SparkR)加载SparkR库

  6. sc=sparkR.init(master="local")

    sc = sparkR.init(主=“本地”)

I tried this on both Spark 1.4.1 and 1.5.1, they both work fine. I hope this can help whoever still having issue after all the suggestion above.

我在Spark 1.4.1和1.5.1上都试过了,它们都很好用。我希望这能帮助那些在以上建议之后还有问题的人。

#5


0  

I had a similar issue. In my case the problem was with the hyphen ('-').
by changing the code :

我也有类似的问题。在我的例子中,问题是连字符('-')。通过更改代码:

sc <- sparkR.init(master = "local[*]",sparkPackages = c("com.databricks:spark-csv_2.11-1.4.0"))

to:

:

sc <- sparkR.init(master = "local[*]",sparkPackages = c("com.databricks:spark-csv_2.11:1.4.0"))

worked for me. Do you notice the change?

为我工作。你注意到变化了吗?

P.S.: Do copy the jar in your SPARK_HOME\lib folder

注::在SPARK_HOME\lib文件夹中复制jar

Edit 1: Also, check that you have configured your "HADOOP_HOME"

编辑1:另外,检查您是否配置了“hadoop op_home”


Hope this helps.

希望这个有帮助。

#6


0  

The following solution will work for Mac OS.

下面的解决方案将适用于Mac OS。

After installing Hadoop followed by Spark.

安装Hadoop之后,Spark。

spark_path <- strsplit(system("brew info apache-spark",intern=T)[4],' ')[[1]][1] # Get your spark path .libPaths(c(file.path(spark_path,"libexec", "R", "lib"), .libPaths())) library(SparkR

spark_path <- strsplit(系统(“brew info apache-spark”,intern=T)[4],')[[1]][1] # Get your spark path . libpaths (c文件)。路径(spark_path,“libexec”,“R”,“lib”),.libPaths())))库(SparkR

#7


0  

I also had this error, from a different cause. Under the hood, Spark calls

我也有这个错误,来自不同的原因。在引擎盖下,火花呼叫

system2(sparkSubmitBin, combinedArgs, wait = F)

There are many ways this can go wrong. In my case the underlying error (invisible until calling system2 directly as an experiment) was ""UNC path are not supported." I had to change my working directory in R studio to a directory which was not part of a network share, and then it started working.

这有很多可能出错的地方。在我的例子中,潜在的错误(在直接调用system2作为实验之前是不可见的)是“UNC路径不受支持”。我不得不将R studio中的工作目录更改为不属于网络共享的目录,然后它开始工作。

#1


5  

I think it was a bug that has now been resolved. Try the following,

我认为这个错误现在已经解决了。尝试以下,

Sys.setenv(SPARK_HOME="C:\\spark-1.4.0")

.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))

library("SparkR", lib.loc="C:\\spark-1.4.0\\lib") # The use of \\ is for windows environment.

library(SparkR)

sc=sparkR.init(master="local")

Launching java with spark-submit command C:\spark-1.4.0/bin/spark-submit.cmd sparkr-shell

用spark-submit命令C启动java:\spark-1.4.0/bin/spark-submit。cmd sparkr-shell

C:\Users\Ashish\AppData\Local\Temp\RtmpWqFsOB\backend_portbdc329477c6

C:\Users\Ashish\AppData\Local\Temp\RtmpWqFsOB\ backend_portbdc329477c6

Hope this helps.

希望这个有帮助。

#2


2  

I had the same issue and my spark-submit.cmd file was also not executing from the command line. Following steps worked for me

我有同样的问题,也有我的spark-submit。cmd文件也没有从命令行执行。以下步骤对我很有效

Go to your environment variables and in the system variables select variable name PATH. Along with other values add c:/Windows/System32/ separated by a semicolon. This made my spark-submit.cmd run from command line and eventually from the Rstudio.

转到您的环境变量,在系统变量中选择变量名路径。与其他值一起添加c:/Windows/System32/由分号分隔。这使得我的spark-submit。cmd从命令行运行,最终从Rstudio运行。

I have realized that we get the above issue only if all the required path values are not specified. Ensure all your path values(R, Rtools) are specified in the environment variables. For instance my Rtools path was c:\Rtools\bin;c:\Rtools\gcc-4.6.3\bin

我已经意识到,只有当没有指定所有必需的路径值时,才会出现上述问题。确保在环境变量中指定了所有路径值(R、Rtools)。例如,我的Rtools path是c:\Rtools\bin;c:\Rtools\gcc-4.6.3\bin

I hope this helps.

我希望这可以帮助。

#3


1  

That didn't work for me. If anyone has the same problem, try to give execute permissions to c:/sparkpath/bin/spark-submit.cmd.

这对我不起作用。如果有人有同样的问题,试着给c:/sparkpath/bin/spark-submit.cmd执行权限。

#4


0  

I had exact same issue. I can start SparkR in command line, but not in RStudio in Windows. And here is the solution works for me.

我也有同样的问题。我可以在命令行启动SparkR,但不能在Windows的RStudio中启动。这就是我的解。

  1. clean up all the paths you set when you tried to fix this issue. This including the paths you set in the windows environment from window control panel and uses Sys.unsetenv() to unset the SPARK_HOME.

    清理试图修复此问题时设置的所有路径。这包括您在windows环境中从窗口控制面板设置的路径,并使用Sys.unsetenv()来取消SPARK_HOME的设置。

  2. find out your RStudio default working directory by using getwd() in RStudio. And then create a .Rprofile file in this directory. Put the following line in this file: .libPaths("C:/Apache/Spark-1.5.1/R/lib")

    在RStudio中使用getwd()查找RStudio默认工作目录。然后在这个目录中创建一个. rprofile文件。在这个文件中放置如下一行:.libPaths(“C:/Apache/Spark-1.5.1/R/lib”)

  3. In window control panel->System->Advanced system settings->Environment Variables, add this ";C:\Apache\Spark-1.5.1\bin" at the end of your exsiting PATH variable.

    在窗口控制面板—>系统—>高级系统设置—>环境变量中,在exsiting PATH变量的末尾添加“C:\Apache\Spark-1.5.1\bin”。

  4. Start RStudio, if you type .libPaths(), you can see the SparkR library path is already in the library path

    启动RStudio,如果您输入.libPaths(),您可以看到SparkR库路径已经在库路径中

  5. use library(SparkR) to load SparkR library

    使用库(SparkR)加载SparkR库

  6. sc=sparkR.init(master="local")

    sc = sparkR.init(主=“本地”)

I tried this on both Spark 1.4.1 and 1.5.1, they both work fine. I hope this can help whoever still having issue after all the suggestion above.

我在Spark 1.4.1和1.5.1上都试过了,它们都很好用。我希望这能帮助那些在以上建议之后还有问题的人。

#5


0  

I had a similar issue. In my case the problem was with the hyphen ('-').
by changing the code :

我也有类似的问题。在我的例子中,问题是连字符('-')。通过更改代码:

sc <- sparkR.init(master = "local[*]",sparkPackages = c("com.databricks:spark-csv_2.11-1.4.0"))

to:

:

sc <- sparkR.init(master = "local[*]",sparkPackages = c("com.databricks:spark-csv_2.11:1.4.0"))

worked for me. Do you notice the change?

为我工作。你注意到变化了吗?

P.S.: Do copy the jar in your SPARK_HOME\lib folder

注::在SPARK_HOME\lib文件夹中复制jar

Edit 1: Also, check that you have configured your "HADOOP_HOME"

编辑1:另外,检查您是否配置了“hadoop op_home”


Hope this helps.

希望这个有帮助。

#6


0  

The following solution will work for Mac OS.

下面的解决方案将适用于Mac OS。

After installing Hadoop followed by Spark.

安装Hadoop之后,Spark。

spark_path <- strsplit(system("brew info apache-spark",intern=T)[4],' ')[[1]][1] # Get your spark path .libPaths(c(file.path(spark_path,"libexec", "R", "lib"), .libPaths())) library(SparkR

spark_path <- strsplit(系统(“brew info apache-spark”,intern=T)[4],')[[1]][1] # Get your spark path . libpaths (c文件)。路径(spark_path,“libexec”,“R”,“lib”),.libPaths())))库(SparkR

#7


0  

I also had this error, from a different cause. Under the hood, Spark calls

我也有这个错误,来自不同的原因。在引擎盖下,火花呼叫

system2(sparkSubmitBin, combinedArgs, wait = F)

There are many ways this can go wrong. In my case the underlying error (invisible until calling system2 directly as an experiment) was ""UNC path are not supported." I had to change my working directory in R studio to a directory which was not part of a network share, and then it started working.

这有很多可能出错的地方。在我的例子中,潜在的错误(在直接调用system2作为实验之前是不可见的)是“UNC路径不受支持”。我不得不将R studio中的工作目录更改为不属于网络共享的目录,然后它开始工作。