Hadoop生态圈-Flume的组件之自定义拦截器(interceptor)

时间:2023-03-08 19:00:55
Hadoop生态圈-Flume的组件之自定义拦截器(interceptor)

                Hadoop生态圈-Flume的组件之自定义拦截器(interceptor)

                                              作者:尹正杰

版权声明:原创作品,谢绝转载!否则将追究法律责任。

  本篇博客只是举例了一个自定义拦截器的方法,测试字节传输速度。

1>.自定义interceptor方法

 /*
@author :yinzhengjie
Blog:http://www.cnblogs.com/yinzhengjie/tag/Hadoop%E7%94%9F%E6%80%81%E5%9C%88/
EMAIL:y1053419035@qq.com
*/
package cn.org.yinzhengjie.interceptor; import org.apache.flume.Context;
import org.apache.flume.Event;
import org.apache.flume.interceptor.Interceptor; import java.util.List; /**
* 设置限速拦截器
* <p>
* 当 字节/时间,即同一时刻,如果进入的字节过多
* 则休眠一会
*/
public class MyInterceptor implements Interceptor { private int speed; //构造
private MyInterceptor(int speed) {
this.speed = speed;
} //do nothing
public void initialize() { } /**
* 1、拿出上一个event的时间,和当前时间进行相减,得出上一个event的时间间隔
* 2、得到上一个event的body字节数
* 3、相除得到上一个event的速度,并在此event中先进行停留,再返回event
*
* @param event
* @return
*/ long lastTime = -1;
long lastBodySize = 0; public Event intercept(Event event) { byte[] body = event.getBody();
int len = body.length; long current = System.nanoTime(); //第一个event
if (lastTime == -1) {
lastTime = current;
lastBodySize = len;
} //非第一个event
else {
//计算上一个event停留的时间
long interval = current - lastTime;
System.out.println("=========================" + current + "/" + lastTime + "/" + interval + "=========================");
//上一个event的速度
int now_speed = (int) ((double) lastBodySize / interval * 1000);
if (now_speed > speed) {
System.out.println("=========================" + now_speed + "=========================");
//计算需要停留多少秒 线程休眠,时间 = shouldTime - interval
try {
Thread.sleep((lastBodySize / speed) * 1000 - interval);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
lastBodySize = len;
lastTime = System.currentTimeMillis(); }
return event; } //迭代List<Event>,将所有Event交给intercept(Event)进行处理
public List<Event> intercept(List<Event> events) {
for (Event event : events) {
intercept(event);
}
return events;
} //do nothing
public void close() { } public static class Builder implements Interceptor.Builder { private int speed; public void configure(Context context) {
speed = context.getInteger(Constants.SPEED, Constants.DEFAULT_SPEED); } public Interceptor build() {
return new MyInterceptor(speed);
}
} public static class Constants {
public static String SPEED = "speed";
public static int DEFAULT_SPEED = 1; }
}

2>.打包并将其发送到 /soft/flume/lib下

[yinzhengjie@s101 ~]$ cd /soft/flume/lib/
[yinzhengjie@s101 lib]$
[yinzhengjie@s101 lib]$ ll | grep MyFlume
-rw-r--r-- 1 yinzhengjie yinzhengjie 5231 Jun 20 18:53 MyFlume-1.0-SNAPSHOT.jar
[yinzhengjie@s101 lib]$
[yinzhengjie@s101 lib]$ rm -rf MyFlume-1.0-SNAPSHOT.jar
[yinzhengjie@s101 lib]$
[yinzhengjie@s101 lib]$ rz [yinzhengjie@s101 lib]$
[yinzhengjie@s101 lib]$ ll | grep MyFlume
-rw-r--r-- 1 yinzhengjie yinzhengjie 8667 Jun 20 21:02 MyFlume-1.0-SNAPSHOT.jar
[yinzhengjie@s101 lib]$
[yinzhengjie@s101 lib]$

3>.编写agent的配置文件

[yinzhengjie@s101 ~]$ more /soft/flume/conf/yinzhengjie_myInterceptor.conf
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1 # 定义源: seq
a1.sources.r1.type = seq
# 定义一次RPC产生的批次数量
a1.sources.r1.batchSize = # 指定添加拦截器
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = cn.org.yinzhengjie.interceptor.MyInterceptor$Builder
a1.sources.r1.interceptors.i1.speed = # Describe the sink
a1.sinks.k1.type = logger # Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity =
a1.channels.c1.transactionCapacity = # Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
[yinzhengjie@s101 ~]$

4>.启动flume并测试

[yinzhengjie@s101 ~]$ flume-ng agent -f /soft/flume/conf/yinzhengjie_myInterceptor.conf -n a1

  下图是运行agent部分的输出内容

Hadoop生态圈-Flume的组件之自定义拦截器(interceptor)