Flume官方文档翻译——Flume 1.7.0 User Guide (unreleased version)中一些知识点

时间:2021-07-22 11:14:46

Property Name           

Default 

Description

flume.called.from.service

If this property is specified then the Flume agent will continue polling for the config file even if the config file is not found at the expected location. Otherwise, the Flume agent will terminate if the config doesn’t exist at the expected location. No property value is needed when setting this property (eg, just specifying -Dflume.called.from.service is enough)

如果这个属性被指定了,那么Flume agent会轮询配置文档即使在指定路径找不到该文档。此外,FLume agent将会结束如果配置文档不在指定位置上。这个属性不需要设置值(例如,只是指定-Dflume.called.from.service就足够了)

Property: flume.called.from.service

Flume periodically polls, every 30 seconds, for changes to the specified config file. A Flume agent loads a new configuration from the config file if either an existing file is polled for the first time, or if an existing file’s modification date has changed since the last time it was polled. Renaming or moving a file does not change its modification time. When a Flume agent polls a non-existent file then one of two things happens: 1. When the agent polls a non-existent config file for the first time, then the agent behaves according to the flume.called.from.service property. If the property is set, then the agent will continue polling (always at the same period – every 30 seconds). If the property is not set, then the agent immediately terminates. ...OR... 2. When the agent polls a non-existent config file and this is not the first time the file is polled, then the agent makes no config changes for this polling period. The agent continues polling rather than terminating.

Flume每30秒周期轮询配置文档是否改变。如果一个文档是第一次被轮询到或者在上次轮询后修改时间被改变了,那么Flume agent会加载新的配置文档。重命名和移动一个文档不会改变文档的修改时间。当一个Flume agent轮询一个不存在的文档会出现以下两种情况的一种:1. 当在指定目录下轮询不到配置文件时,agent会根据flume.called.from.service property这个属性来决定他的行为。如果这个属性设置了,那么会以30秒为周期地进行轮询;如果没设置,那么找不到就立即停止。2. 如果agent在加载过配置文件后在指定路径轮询不到文件的话,那么将不会做任何改变,然后继续轮询。

Log4J Appender(Log4J 日志存储器)

Appends Log4j events to a flume agent’s avro source. A client using this appender must have the flume-ng-sdk in the classpath (eg, flume-ng-sdk-1.8.0-SNAPSHOT.jar). Required properties are in bold.

将Log4j events 添加到一个flume agent的avro source。一个客户端想要使用这个appender必须要有 flume-ng-sdk在类路径下(例如flume-ng-sdk-1.8.0-SNAPSHOT.jar)。必须要的属性用黑体加粗。

Property Name

Default

Description

Hostname

The hostname on which a remote Flume agent is running with an avro source.

运行avro source的远程Flumeagent的主机名

Port

The port at which the remote Flume agent’s avro source is listening.

远程Flume agent的avro source所监听的端口

UnsafeMode

false

If true, the appender will not throw exceptions on failure to send the events.

如果设为真,appender将不会在发送events失败时抛出异常。

AvroReflectionEnabled

false

Use Avro Reflection to serialize Log4j events. (Do not use when users log strings)

使用 Avro反射来序列化 Log4j events。

AvroSchemaUrl

A URL from which the Avro schema can be retrieved.

一个用来恢复数据的URL,该URL是从 Avro schema来的。

Sample log4j.properties file:

#...

log4j.appender.flume = org.apache.flume.clients.log4jappender.Log4jAppender

log4j.appender.flume.Hostname = example.com

log4j.appender.flume.Port = 41414

log4j.appender.flume.UnsafeMode = true

# configure a class's logger to output to the flume appender

log4j.logger.org.example.MyClass = DEBUG,flume

#...

By default each event is converted to a string by calling toString(), or by using the Log4j layout, if specified.

If the event is an instance of org.apache.avro.generic.GenericRecord, org.apache.avro.specific.SpecificRecord, or if the property AvroReflectionEnabled  is set to true then the event will be serialized using Avro serialization.

Serializing every event with its Avro schema is inefficient, so it is good practice to provide a schema URL from which the schema can be retrieved by the downstream sink, typically the HDFS sink. If AvroSchemaUrl is not specified, then the schema will be included as a Flume header.

Sample log4j.properties file configured to use Avro serialization:

每个events默认都可以通过toString()来转换成字符串,或者有指定的话可用Log4j layout。

如果events是一个org.apache.avro.generic.GenericRecord, org.apache.avro.specific.SpecificRecord类的实例,或者它的属性AvroReflectionEnabled的值为true,那么会使用Avro serialization进行序列化。

对每个event和它的Avro schema进行序列化是低效的,所以,一个好的实践是提供一个可以从下流的sink中恢复的schemaURL,通常是HDFS sink。如果没有指定AvroSchemaUrl的话,schema会被纳入到Flume haeder。

一个使用Avro serialization的log4j属性文档的例子如下:

#...

log4j.appender.flume = org.apache.flume.clients.log4jappender.Log4jAppender

log4j.appender.flume.Hostname = example.com

log4j.appender.flume.Port = 41414

log4j.appender.flume.AvroReflectionEnabled = true

log4j.appender.flume.AvroSchemaUrl = hdfs://namenode/path/to/schema.avsc

# configure a class's logger to output to the flume appender

log4j.logger.org.example.MyClass = DEBUG,flume

#...

Load Balancing Log4J Appender

Appends Log4j events to a list of flume agent’s avro source. A client using this appender must have the flume-ng-sdk in the classpath (eg, flume-ng-sdk-1.8.0-SNAPSHOT.jar). This appender supports a round-robin and random scheme for performing the load balancing. It also supports a configurable backoff timeout so that down agents are removed temporarily from the set of hosts .Required properties are in bold.

将Log4j events 添加到一个flume agent的avro source。一个客户端想要使用这个appender必须要有 flume-ng-sdk在类路径下(例如flume-ng-sdk-1.8.0-SNAPSHOT.jar)。这个日志存储器支持一个循环和随机计划来执行负载均衡。它也支持一个可配置的后移超时以便将挂掉的agent从主机中移除。黑体字标注的属性是必须要的。

Property Name

Default

Description

Hosts

A space-separated list of host:port at which Flume (through an AvroSource) is listening for events

列出监听events的主机列表,每个host:port用空格隔开。

Selector

ROUND_ROBIN

Selection mechanism. Must be either ROUND_ROBIN, RANDOM or custom FQDN to class that inherits from LoadBalancingSelector.

选择机制。必须从ROUND_ROBIN,RANDOM或者继承LoadBalancingSelector的自定义FQDN类。

MaxBackoff

A long value representing the maximum amount of time in milliseconds the Load balancing client will backoff from a node that has failed to consume an event. Defaults to no backoff

这个值代表以毫秒为单位的退避超时最大值,也就是当一个节点在消费event时失效了,等待超时时间再进行重发event。默认是没有退避的

UnsafeMode

false

If true, the appender will not throw exceptions on failure to send the events.

如果设为真,appender将不会在发送events失败时抛出异常。

AvroReflectionEnabled

false

Use Avro Reflection to serialize Log4j events.

使用 Avro反射来序列化 Log4j events。

AvroSchemaUrl

A URL from which the Avro schema can be retrieved.

一个用来恢复数据的URL,该URL是从 Avro schema来的。

Sample log4j.properties file configured using defaults:

#...

log4j.appender.out2 = org.apache.flume.clients.log4jappender.LoadBalancingLog4jAppender

log4j.appender.out2.Hosts = localhost:25430 localhost:25431

# configure a class's logger to output to the flume appender

log4j.logger.org.example.MyClass = DEBUG,flume

#...

Sample log4j.properties file configured using RANDOM load balancing:

#...

log4j.appender.out2 = org.apache.flume.clients.log4jappender.LoadBalancingLog4jAppender

log4j.appender.out2.Hosts = localhost:25430 localhost:25431

log4j.appender.out2.Selector = RANDOM

# configure a class's logger to output to the flume appender

log4j.logger.org.example.MyClass = DEBUG,flume

#...

Sample log4j.properties file configured using backoff:

#...

log4j.appender.out2 = org.apache.flume.clients.log4jappender.LoadBalancingLog4jAppender

log4j.appender.out2.Hosts = localhost:25430 localhost:25431 localhost:25432

log4j.appender.out2.Selector = ROUND_ROBIN

log4j.appender.out2.MaxBackoff = 30000

# configure a class's logger to output to the flume appender

log4j.logger.org.example.MyClass = DEBUG,flume

#...

Security(安全性)

The HDFS sink, HBase sink, Thrift source, Thrift sink and Kite Dataset sink all support Kerberos authentication. Please refer to the corresponding sections for configuring the Kerberos-related options.

Flume agent will authenticate to the kerberos KDC as a single principal, which will be used by different components that require kerberos authentication. The principal and keytab configured for Thrift source, Thrift sink, HDFS sink, HBase sink and DataSet sink should be the same, otherwise the component will fail to start.

HDFS sink、HBase sink、Thrift source、Thrift sink和Kite Dataset sink支持Kerberos认证。请参考配置Kerberos相关选项的章节。

当agent中的不同组件需要kerberos验证,Flume agent会作为kerberos KDC验证的主体。Thrift source, Thrift sink, HDFS sink, HBase sink and DataSet sink的密钥和主体都应该相同,否则组件无法启动。