大数据基础(八) Spark 2.0.0下IPython和Notebook的安装配置

时间:2022-10-07 09:19:38

环境:

spark 2.0.0,anaconda2

1.spark ipython和notebook安装配置

方法一:

这个方法可以通过网页进入ipython notebook,另开终端可以进入pyspark
如果装有Anaconda 就可以直接如下方式获得IPython界面的登陆,没有装Anaconda的参考最下边的链接自行安装ipython相关包。
vi ~/.bashrc
export PYSPARK_DRIVER_PYTHON=ipython
export PYSPARK_DRIVER_PYTHON_OPTS="notebook --NotebookApp.open_browser=False --NotebookApp.ip='*' --NotebookApp.port=8880"
source ~/.bashrc


重新启动pyspark
出现
ting a Notebook with PySpark
On the driver host, choose a directory notebook_directory to run the Notebook. notebook_directory contains the .ipynb files that represent the different notebooks that can be served.
In notebook_directory, run pyspark with your desired runtime options. You should see output like the following:
参考:
ipython和jupyter on spark 2.0.0
http://www.cloudera.com/documentation/enterprise/5-5-x/topics/spark_ipython.html


方法二:
方法二用ipython可以,但是jupyter有问题,不知道是不是个别的
It is also possible to launch the PySpark shell in IPython, the enhanced Python interpreter. PySpark works with IPython 1.0.0 and later. To use IPython, set the PYSPARK_DRIVER_PYTHON variable to ipython when running bin/pyspark:


$ PYSPARK_DRIVER_PYTHON=ipython ./bin/pyspark
To use the Jupyter notebook (previously known as the IPython notebook),


$ PYSPARK_DRIVER_PYTHON=jupyter ./bin/pyspark
You can customize the ipython or jupyter commands by setting PYSPARK_DRIVER_PYTHON_OPTS.


root@py-server:/server/bin# PYSPARK_DRIVER_PYTHON=ipython $SPARK_HOME/bin/pyspark
Python 2.7.12 |Anaconda 4.1.1 (64-bit)| (default, Jul  2 2016, 17:42:40) 
Type "copyright", "credits" or "license" for more information.


IPython 4.2.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
16/08/03 22:24:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.0.0
      /_/


Using Python version 2.7.12 (default, Jul  2 2016 17:42:40)
SparkSession available as 'spark'.


In [1]: 






2. 使用:


Open http://notebook_host:8880/ in a browser.
比如:http://spark01:8880/
New->Python打开Python界面
Shift+Enter or Shift+Return执行命令


注意:

设置IPython后,pyspark就只能用IPython,除非恢复环境变量


3.测试例子

引用:《Spark for Python Developers》

file_in换成你自己的文件,如果是本地就用#那一句,hdfs就默认,修改一下具体地址即可。

大数据基础(八) Spark 2.0.0下IPython和Notebook的安装配置大数据基础(八) Spark 2.0.0下IPython和Notebook的安装配置