关于yarn的spark配置属性

时间:2023-03-09 08:08:34
关于yarn的spark配置属性

spark1.2.0


These are configs that are specific to Spark on YARN
Property Name Default Meaning
 spark.yarn.applicationMaster.waitTries  10  ApplicationMaster 链接Spark master和SparkContext初始化的尝试次数
 spark.yarn.submit.file.replication  3  上传到HDFS上的Spark jar、app jar登文件的备份数
 spark.yarn.preserve.staging.files  false  spark任务结束后是否删除上传的Spark jar、app jar等文件
 spark.yarn.scheduler.heartbeat.interval-ms  5000  Spark application master向YARN ResourceManager发送心跳的时间间隔
 spark.yarn.max.executor.failures

numExecutors * 2

最小是3

 executor 失败多少次就标记此应用的运行失败
 park.yarn.historyServer.address  空  默认无,是可选项,格式为host.com:port,不带http://,是spark历史服务器的地址
 spark.yarn.dist.archives  空  Comma separated list of archives to be extracted into the working directory of each executor
 spark.yarn.dist.files  空  Comma-separated list of files to be placed in the working directory of each executor.
 spark.yarn.executor.memoryOverhead

executorMemory * 0.07,

with minimum of 384

 The amount of off heap memory (in megabytes) to be allocated per executor. This is memory that accounts for things like VM overheads, interned strings, other native overheads, etc. This tends to grow with the executor size (typically 6-10%).
 spark.yarn.driver.memoryOverhead

driverMemory * 0.07

with minimum of 384

 The amount of off heap memory (in megabytes) to be allocated per driver. This is memory that accounts for things like VM overheads, interned strings, other native overheads, etc. This tends to grow with the container size (typically 6-10%).
 spark.yarn.queue  default  应用提交给yarn队列的名字
 spark.yarn.jar  空  spark jar包的的路径,默认使用本地spark目录中的,也可以放到HDFS上
 spark.yarn.access.namenodes  空  有安全认证的HDFS namenode的地址如:spark.yarn.access.namenodes=hdfs://nn1.com:8032,hdfs://nn2.com:8032
 spark.yarn.appMasterEnv.[EnvironmentVariableName]  空  环境变量的设置
 spark.yarn.containerLauncherMaxThreads  25  application master启动executor container的线程最大数量