idea运行spark的wordcount与eclipse运行spark的wordcount示例,及本地运行sparkpi

时间:2021-06-03 09:18:14

一、idea运行wordcount

1、下载idea的社区版本(免费的)

http://www.jetbrains.com/idea/download/

2、安装scala插件

File-->Settings...----->Plugins

idea运行spark的wordcount与eclipse运行spark的wordcount示例,及本地运行sparkpi

点击红色按钮,在搜索scala点击右面的初始化按钮安装scala插件

3、创建scala项目

File--->New Project....----->scala

idea运行spark的wordcount与eclipse运行spark的wordcount示例,及本地运行sparkpi

选择好对应的SDK,完成创建

4、创建包类结构如下

idea运行spark的wordcount与eclipse运行spark的wordcount示例,及本地运行sparkpi

/**
 * 统计字符出现次数
 */
object WordCount {
  def main(args: Array[String]) {
    if (args.length < 2) {
      System.err.println("Usage: <file>")
      System.exit(1)
    }

    val conf = new SparkConf()
    val sc = new SparkContext(conf)
    val line = sc.textFile(args(0))
    //输出到文件
    line.flatMap(_.split(" ")).map((_, 1)).reduceByKey(_+_).saveAsTextFile(args(1))
    //输出到屏幕
    line.flatMap(_.split(" ")).map((_, 1)).reduceByKey(_+_).collect().foreach(println)
    sc.stop()
  }
}

引入spark包

File---->Project Structure...---->Libraries

点击 “+” 选择java

idea运行spark的wordcount与eclipse运行spark的wordcount示例,及本地运行sparkpi

5、打包

File-->Project Structure...--->Artifacts

idea运行spark的wordcount与eclipse运行spark的wordcount示例,及本地运行sparkpi

接下来build

Build--->Build Artifaces...--->build

地一次选择build第二次选择rebuild

7、执行

将文件上传至linux上

运行spark环境

命令如下:

[hadoop@master bin]$ ./spark-submit --master spark://192.168.189.136:7077 --class main.scala.com.spark.firstapp.WordCount --executor-memory 1g /opt/testspark/FirstSparkApp2.jar hdfs://master:9000/user/hadoop/input/README.txt hdfs://master:9000/user/hadoop/output

8、结果

(if,1)
(Commerce,,1)
(or,2)
(another,1)
(software.,2)
(laws,,1)
(BEFORE,1)
(source,1)
(Hadoop,,1)
(to,2)
(written,1)
(code,1)
(software,,2)
(Regulations,,1)
(more,2)
(regulations,1)
(see,1)
(of,5)
(libraries,1)
(by,1)
(exception,1)
(Control,1)
(Government,1)
(code.,1)
(eligible,1)
(both,1)
(License,1)
(Foundation,1)
(functions,1)
(and,6)
(software:,1)
(5D002.C.1,,1)
((TSU),1)
(Hadoop,1)
15/03/13 14:34:17 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/metrics/json,null}
15/03/13 14:34:17 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/kill,null}

输出在屏幕上的结果如上,也可以查看hdfs上的结果文件


二、eclipse下创建scala运行wordcount


1、下载scala ide for eclipse

http://scala-ide.org/download/sdk.html

2、创建scala项目

File-->New-->Scala Project

idea运行spark的wordcount与eclipse运行spark的wordcount示例,及本地运行sparkpi

项目结构如下

idea运行spark的wordcount与eclipse运行spark的wordcount示例,及本地运行sparkpi


与上述代码一致

引入spark的编译包

idea运行spark的wordcount与eclipse运行spark的wordcount示例,及本地运行sparkpi

3、打包

File--->Export--->java-->JAR file

idea运行spark的wordcount与eclipse运行spark的wordcount示例,及本地运行sparkpi


4、运行

上传到linux上

命令一样,结果页一样参考上面的

[hadoop@master bin]$ ./spark-submit --master spark://192.168.189.136:7077 --class main.scala.com.spark.firstapp.WordCount --executor-memory 1g /opt/testspark/FirstSparkApp2.jar hdfs://master:9000/user/hadoop/input/README.txt hdfs://master:9000/user/hadoop/output



三、本地运行SparkPi,idea及eclipse都可


代码如下:


object SparkPi {
def main(args: Array[String]) {
//val conf = new SparkConf().setAppName("Spark Pi").setMaster("spark://192.168.189.136:7077").setJars(List("D:\\scala\\sparkjar\\sparktest.jar"))
//val spark = new SparkContext("spark://master:7070", "Spark Pi", "F:\\soft\\spark\\spark-1.1.0-bin-hadoop2.4", List("out\\artifacts\\sparkTest_jar\\sparkTest.jar"))
val conf = new SparkConf().setAppName("Spark Pi").setMaster("local")//主要是这句
val spark = new SparkContext(conf)
val slices = if (args.length > 0) args(0).toInt else 2
val n = 100000 * slices
val count = spark.parallelize(1 to n, slices).map { i =>
val x = random * 2 - 1
val y = random * 2 - 1
if (x * x + y * y < 1) 1 else 0
}.reduce(_ + _)
println("Pi is roughly " + 4.0 * count / n)
spark.stop()
}
}