二十二、Hadoop学记笔记————Kafka 基础实战：消费者和生产者实例

kafka的客户端也支持其他语言，这里主要介绍python和java的实现，这两门语言比较主流和热门

图中有四个分区，每个图形对应一个consumer，任意一对一即可

二十二、Hadoop学记笔记————Kafka 基础实战：消费者和生产者实例

获取topic的分区数，每个分区创建一个进程消费分区中的数据。

每个进程的实例中，先要创建连接kafka的实例，然后指定连接到哪个topic（主图），哪个分区

之后要设置kafka的偏移量，kafka中每条消息都有偏移量，如果消费者突然宕机了，则可以从上个偏移量继续消费

提交偏移量的工作客户端都会默认操作，因此提交偏移量可选

后续会根据伪代码描述编写程序

二十二、Hadoop学记笔记————Kafka 基础实战：消费者和生产者实例

GroupA和GourpB都能拿到当前topic的全部数据，组消费可以复制消费，即kafka会复制消息分别发送给组A和组B

二十二、Hadoop学记笔记————Kafka 基础实战：消费者和生产者实例

流数N指代每个Gourp中有都少个consumer，上图中A有2个流，B有4个流

每个consumer实力也需要创建连接kafka的实例，设置连接到哪个topic和分区

也可以设置偏移量，与分区消费一样

按组消费可以选择从头消费还是从最新消费

二十二、Hadoop学记笔记————Kafka 基础实战：消费者和生产者实例

PT代表topic T下的所有分区，CG代表Group中有多少个consumer实例

排序分区parition，排序consumer

对于前面的例子GourpA，就是PT=4，CG=2，所以N等于2

二十二、Hadoop学记笔记————Kafka 基础实战：消费者和生产者实例

分区模式中，所有生产者也默认至少发送一次消息，但是可以自定义发送一次接受一次，或者只发送一次不管是否接收

二十二、Hadoop学记笔记————Kafka 基础实战：消费者和生产者实例

kafka版本与服务器一致即可

二十二、Hadoop学记笔记————Kafka 基础实战：消费者和生产者实例

pom文件如下

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">

  <modelVersion>4.0.0</modelVersion>

  <groupId>com.jike.kafkatest</groupId>

  <artifactId>JikeKafka</artifactId>

  <version>1.0</version>

  <packaging>jar</packaging>

  <name>JikeKafka</name>

  <url>http://maven.apache.org</url>

  <properties>

    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>

  </properties>

  <dependencies>

    <dependency>

      <groupId>junit</groupId>

      <artifactId>junit</artifactId>

      <version>3.8.1</version>

      <scope>test</scope>

    </dependency>

    <dependency>

      <groupId>org.apache.kafka</groupId>

      <artifactId>kafka_2.9.2</artifactId>

      <version>0.8.1.1</version>

      <exclusions>

       <exclusion>

        <artifactId>jmxri</artifactId>

        <groupId>com.sun.jmx</groupId>

       </exclusion>

       <exclusion>

        <artifactId>jms</artifactId>

        <groupId>javax.jms</groupId>

       </exclusion>

       <exclusion>

        <artifactId>jmxtools</artifactId>

        <groupId>com.sun.jdmk</groupId>

       </exclusion>

      </exclusions>

    </dependency>

    <dependency>

        <groupId>org.apache.avro</groupId>

        <artifactId>avro</artifactId>

        <version>1.7.3</version>

    </dependency>

    <dependency>

        <groupId>org.apache.avro</groupId>

        <artifactId>avro-ipc</artifactId>

        <version>1.7.3</version>

    </dependency>

  </dependencies>

  <build>

    <sourceDirectory>src/main/java</sourceDirectory>

    <testSourceDirectory>src/test/java</testSourceDirectory>

    <plugins>

      <!--

        Bind the maven-assembly-plugin to the package phase

        this will create a jar file without the storm dependencies

        suitable for deployment to a cluster.

       -->

      <plugin>

        <artifactId>maven-assembly-plugin</artifactId>

        <configuration>

          <descriptorRefs>

            <descriptorRef>jar-with-dependencies</descriptorRef>

          </descriptorRefs>

          <archive>

            <manifest>

              <mainClass></mainClass>

            </manifest>

          </archive>

        </configuration>

        <executions>

          <execution>

            <id>make-assembly</id>

            <phase>package</phase>

            <goals>

              <goal>single</goal>

            </goals>

          </execution>

        </executions>

    </plugin>

    </plugins>

  </build>

</project>

分组模式下Java代码如下：

package kafka.consumer.group;

import kafka.consumer.ConsumerIterator;

import kafka.consumer.KafkaStream;

public class ConsumerTest implements Runnable {

    private KafkaStream m_stream;

    private int m_threadNumber;

    public ConsumerTest(KafkaStream a_stream, int a_threadNumber) {

        m_threadNumber = a_threadNumber;

        m_stream = a_stream;

    }

    public void run() {

        ConsumerIterator<byte[], byte[]> it = m_stream.iterator();

        while (it.hasNext()){

            System.out.println("Thread " + m_threadNumber + ": " + new String(it.next().message()));

        }

        System.out.println("Shutting down Thread: " + m_threadNumber);

    }

}

package kafka.consumer.group;

import kafka.consumer.ConsumerConfig;

import kafka.consumer.KafkaStream;

import kafka.javaapi.consumer.ConsumerConnector;

import java.util.HashMap;

import java.util.List;

import java.util.Map;

import java.util.Properties;

import java.util.concurrent.ExecutorService;

import java.util.concurrent.Executors;

import java.util.concurrent.TimeUnit;

public class GroupConsumerTest extends Thread {

    private final ConsumerConnector consumer;

    private final String topic;

    private  ExecutorService executor;

    public GroupConsumerTest(String a_zookeeper, String a_groupId, String a_topic){

        consumer = kafka.consumer.Consumer.createJavaConsumerConnector(

                createConsumerConfig(a_zookeeper, a_groupId));

        this.topic = a_topic;

    }

    public void shutdown() {

        if (consumer != null) consumer.shutdown();

        if (executor != null) executor.shutdown();

        try {

            if (!executor.awaitTermination(Long.MAX_VALUE, TimeUnit.MILLISECONDS)) {

                System.out.println("Timed out waiting for consumer threads to shut down, exiting uncleanly");

            }

        } catch (InterruptedException e) {

            System.out.println("Interrupted during shutdown, exiting uncleanly");

        }

   }

    public void run(int a_numThreads) {

        Map<String, Integer> topicCountMap = new HashMap<String, Integer>();

        topicCountMap.put(topic, new Integer(a_numThreads));

        Map<String, List<KafkaStream<byte[], byte[]>>> consumerMap = consumer.createMessageStreams(topicCountMap);

        List<KafkaStream<byte[], byte[]>> streams = consumerMap.get(topic);

        // now launch all the threads

        //

        executor = Executors.newFixedThreadPool(a_numThreads);

        // now create an object to consume the messages

        //

        int threadNumber = 0;

        for (final KafkaStream stream : streams) {

            executor.submit(new ConsumerTest(stream, threadNumber));

            threadNumber++;

        }

    }

    private static ConsumerConfig createConsumerConfig(String a_zookeeper, String a_groupId) {

        Properties props = new Properties();

        props.put("zookeeper.connect", a_zookeeper);

        props.put("group.id", a_groupId);

        props.put("zookeeper.session.timeout.ms", "40000");

        props.put("zookeeper.sync.time.ms", "2000");

        props.put("auto.commit.interval.ms", "1000");

        return new ConsumerConfig(props);

    }

    public static void main(String[] args) {

        if(args.length < 1){

            System.out.println("Please assign partition number.");

        }

        String zooKeeper = "10.206.216.13:12181,10.206.212.14:12181,10.206.209.25:12181";

        String groupId = "jikegrouptest";

        String topic = "jiketest";

        int threads = Integer.parseInt(args[0]);

        GroupConsumerTest example = new GroupConsumerTest(zooKeeper, groupId, topic);

        example.run(threads);

        try {

            Thread.sleep(Long.MAX_VALUE);

        } catch (InterruptedException ie) {

        }

        example.shutdown();

    }

}

分区模式下Java代码如下：

package kafka.consumer.partition;

import kafka.api.FetchRequest;

import kafka.api.FetchRequestBuilder;

import kafka.api.PartitionOffsetRequestInfo;

import kafka.common.ErrorMapping;

import kafka.common.TopicAndPartition;

import kafka.javaapi.*;

import kafka.javaapi.consumer.SimpleConsumer;

import kafka.message.MessageAndOffset;

import java.nio.ByteBuffer;

import java.util.ArrayList;

import java.util.Collections;

import java.util.HashMap;

import java.util.List;

import java.util.Map;

public class PartitionConsumerTest {

    public static void main(String args[]) {

        PartitionConsumerTest example = new PartitionConsumerTest();

        long maxReads = Long.MAX_VALUE;

        String topic = "jiketest";

        if(args.length < 1){

            System.out.println("Please assign partition number.");

        }

        List<String> seeds = new ArrayList<String>();

        String hosts="10.206.216.13,10.206.212.14,10.206.209.25";

        String[] hostArr = hosts.split(",");

        for(int index = 0;index < hostArr.length;index++){

            seeds.add(hostArr[index].trim());

        }

        int port = 19092;

        int partLen = Integer.parseInt(args[0]);

        for(int index=0;index < partLen;index++){

            try {

                example.run(maxReads, topic, index/*partition*/, seeds, port);

            } catch (Exception e) {

                System.out.println("Oops:" + e);

                 e.printStackTrace();

            }

        }

    }

    private List<String> m_replicaBrokers = new ArrayList<String>();

        public PartitionConsumerTest() {

            m_replicaBrokers = new ArrayList<String>();

        }

        public void run(long a_maxReads, String a_topic, int a_partition, List<String> a_seedBrokers, int a_port) throws Exception {

            // find the meta data about the topic and partition we are interested in

            //

            PartitionMetadata metadata = findLeader(a_seedBrokers, a_port, a_topic, a_partition);

            if (metadata == null) {

                System.out.println("Can't find metadata for Topic and Partition. Exiting");

                return;

            }

            if (metadata.leader() == null) {

                System.out.println("Can't find Leader for Topic and Partition. Exiting");

                return;

            }

            String leadBroker = metadata.leader().host();

            String clientName = "Client_" + a_topic + "_" + a_partition;

            SimpleConsumer consumer = new SimpleConsumer(leadBroker, a_port, 100000, 64 * 1024, clientName);

            long readOffset = getLastOffset(consumer,a_topic, a_partition, kafka.api.OffsetRequest.EarliestTime(), clientName);

            int numErrors = 0;

            while (a_maxReads > 0) {

                if (consumer == null) {

                    consumer = new SimpleConsumer(leadBroker, a_port, 100000, 64 * 1024, clientName);

                }

                FetchRequest req = new FetchRequestBuilder()

                        .clientId(clientName)

                        .addFetch(a_topic, a_partition, readOffset, 100000) // Note: this fetchSize of 100000 might need to be increased if large batches are written to Kafka

                        .build();

                FetchResponse fetchResponse = consumer.fetch(req);

                if (fetchResponse.hasError()) {

                    numErrors++;

                    // Something went wrong!

                    short code = fetchResponse.errorCode(a_topic, a_partition);

                    System.out.println("Error fetching data from the Broker:" + leadBroker + " Reason: " + code);

                    if (numErrors > 5) break;

                    if (code == ErrorMapping.OffsetOutOfRangeCode())  {

                        // We asked for an invalid offset. For simple case ask for the last element to reset

                        readOffset = getLastOffset(consumer,a_topic, a_partition, kafka.api.OffsetRequest.LatestTime(), clientName);

                        continue;

                    }

                    consumer.close();

                    consumer = null;

                    leadBroker = findNewLeader(leadBroker, a_topic, a_partition, a_port);

                    continue;

                }

                numErrors = 0;

                long numRead = 0;

                for (MessageAndOffset messageAndOffset : fetchResponse.messageSet(a_topic, a_partition)) {

                    long currentOffset = messageAndOffset.offset();

                    if (currentOffset < readOffset) {

                        System.out.println("Found an old offset: " + currentOffset + " Expecting: " + readOffset);

                        continue;

                    }

                    readOffset = messageAndOffset.nextOffset();

                    ByteBuffer payload = messageAndOffset.message().payload();

                    byte[] bytes = new byte[payload.limit()];

                    payload.get(bytes);

                    System.out.println(String.valueOf(messageAndOffset.offset()) + ": " + new String(bytes, "UTF-8"));

                    numRead++;

                    a_maxReads--;

                }

                if (numRead == 0) {

                    try {

                        Thread.sleep(1000);

                    } catch (InterruptedException ie) {

                    }

                }

            }

            if (consumer != null) consumer.close();

        }

        public static long getLastOffset(SimpleConsumer consumer, String topic, int partition,

                                         long whichTime, String clientName) {

            TopicAndPartition topicAndPartition = new TopicAndPartition(topic, partition);

            Map<TopicAndPartition, PartitionOffsetRequestInfo> requestInfo = new HashMap<TopicAndPartition, PartitionOffsetRequestInfo>();

            requestInfo.put(topicAndPartition, new PartitionOffsetRequestInfo(whichTime, 1));

            kafka.javaapi.OffsetRequest request = new kafka.javaapi.OffsetRequest(

                    requestInfo, kafka.api.OffsetRequest.CurrentVersion(), clientName);

            OffsetResponse response = consumer.getOffsetsBefore(request);

            if (response.hasError()) {

                System.out.println("Error fetching data Offset Data the Broker. Reason: " + response.errorCode(topic, partition) );

                return 0;

            }

            long[] offsets = response.offsets(topic, partition);

            return offsets[0];

        }

        private String findNewLeader(String a_oldLeader, String a_topic, int a_partition, int a_port) throws Exception {

            for (int i = 0; i < 3; i++) {

                boolean goToSleep = false;

                PartitionMetadata metadata = findLeader(m_replicaBrokers, a_port, a_topic, a_partition);

                if (metadata == null) {

                    goToSleep = true;

                } else if (metadata.leader() == null) {

                    goToSleep = true;

                } else if (a_oldLeader.equalsIgnoreCase(metadata.leader().host()) && i == 0) {

                    // first time through if the leader hasn't changed give ZooKeeper a second to recover

                    // second time, assume the broker did recover before failover, or it was a non-Broker issue

                    //

                    goToSleep = true;

                } else {

                    return metadata.leader().host();

                }

                if (goToSleep) {

                    try {

                        Thread.sleep(1000);

                    } catch (InterruptedException ie) {

                    }

                }

            }

            System.out.println("Unable to find new leader after Broker failure. Exiting");

            throw new Exception("Unable to find new leader after Broker failure. Exiting");

        }

        private PartitionMetadata findLeader(List<String> a_seedBrokers, int a_port, String a_topic, int a_partition) {

            PartitionMetadata returnMetaData = null;

            loop:

            for (String seed : a_seedBrokers) {

                SimpleConsumer consumer = null;

                try {

                    consumer = new SimpleConsumer(seed, a_port, 100000, 64 * 1024, "leaderLookup");

                    List<String> topics = Collections.singletonList(a_topic);

                    TopicMetadataRequest req = new TopicMetadataRequest(topics);

                    kafka.javaapi.TopicMetadataResponse resp = consumer.send(req);

                    List<TopicMetadata> metaData = resp.topicsMetadata();

                    for (TopicMetadata item : metaData) {

                        for (PartitionMetadata part : item.partitionsMetadata()) {

                            if (part.partitionId() == a_partition) {

                                returnMetaData = part;

                                break loop;

                            }

                        }

                    }

                } catch (Exception e) {

                    System.out.println("Error communicating with Broker [" + seed + "] to find Leader for [" + a_topic

                            + ", " + a_partition + "] Reason: " + e);

                } finally {

                    if (consumer != null) consumer.close();

                }

            }

            if (returnMetaData != null) {

                m_replicaBrokers.clear();

                for (kafka.cluster.Broker replica : returnMetaData.replicas()) {

                    m_replicaBrokers.add(replica.host());

                }

            }

            return returnMetaData;

        }

}

参数调优

二十二、Hadoop学记笔记————Kafka 基础实战：消费者和生产者实例

接下来实现生产者，能够像kafka中传递消息

二十二、Hadoop学记笔记————Kafka 基础实战：消费者和生产者实例

生产者发送消息后会不断确认kafka集群是否收到，如果没收到就会重发，如果达到最大次数就会结束生产

二十二、Hadoop学记笔记————Kafka 基础实战：消费者和生产者实例

异步生产的时候，消息会事先缓存在客户端，可以设置最大消息缓存数或者累计缓存时间，如果达到设置的标准，就会打包发送给kafka服务器

二十二、Hadoop学记笔记————Kafka 基础实战：消费者和生产者实例

两种模型伪代码描述非常相似，所以用一个就能表示

先创建链接实例，之后配置负载均衡

在设置生产者参数的时候就会定义是同步还是异步

二十二、Hadoop学记笔记————Kafka 基础实战：消费者和生产者实例

同步模型由于需要同步，所以丢失率基本为0

异步模型中，每个分区每秒可以发送50万条消息

二十二、Hadoop学记笔记————Kafka 基础实战：消费者和生产者实例

接下来实现Java客户端程序编写：

二十二、Hadoop学记笔记————Kafka 基础实战：消费者和生产者实例

pom文件与上述pom文件一样

同步模型代码如下：

package kafka.producer.sync;

import java.util.*;

import kafka.javaapi.producer.Producer;

import kafka.producer.KeyedMessage;

import kafka.producer.ProducerConfig;

public class SyncProduce {

    public static void main(String[] args) {

        long events = Long.MAX_VALUE;

        Random rnd = new Random();

        Properties props = new Properties();

        props.put("metadata.broker.list", "10.206.216.13:19092,10.206.212.14:19092,10.206.209.25:19092");

        props.put("serializer.class", "kafka.serializer.StringEncoder");

        //kafka.serializer.DefaultEncoder

        props.put("partitioner.class", "kafka.producer.partiton.SimplePartitioner");

        //kafka.producer.DefaultPartitioner: based on the hash of the key

        props.put("request.required.acks", "1");

        //0;  绝不等确认  1:   leader的一个副本收到这条消息，并发回确认 -1：   leader的所有副本都收到这条消息，并发回确认

        ProducerConfig config = new ProducerConfig(props);

        Producer<String, String> producer = new Producer<String, String>(config);

        for (long nEvents = 0; nEvents < events; nEvents++) {

               long runtime = new Date().getTime();

               String ip = "192.168.2." + rnd.nextInt(255);

               String msg = runtime + ",www.example.com," + ip;

               //eventKey必须有（即使自己的分区算法不会用到这个key，也不能设为null或者""）,否者自己的分区算法根本得不到调用

               KeyedMessage<String, String> data = new KeyedMessage<String, String>("jiketest", ip, msg);

                                                               //             eventTopic, eventKey, eventBody

               producer.send(data);

               try {

                   Thread.sleep(1000);

               } catch (InterruptedException ie) {

               }

        }

        producer.close();

    }

}

异步模型代码如下：

package kafka.producer.async;

import java.util.*;

import kafka.javaapi.producer.Producer;

import kafka.producer.KeyedMessage;

import kafka.producer.ProducerConfig;

public class ASyncProduce {

    public static void main(String[] args) {

        long events = Long.MAX_VALUE;

        Random rnd = new Random();

        Properties props = new Properties();

        props.put("metadata.broker.list", "10.206.216.13:19092,10.206.212.14:19092,10.206.209.25:19092");

        props.put("serializer.class", "kafka.serializer.StringEncoder");

        //kafka.serializer.DefaultEncoder

        props.put("partitioner.class", "kafka.producer.partiton.SimplePartitioner");

        //kafka.producer.DefaultPartitioner: based on the hash of the key

        //props.put("request.required.acks", "1");

        props.put("producer.type", "async");

        //props.put("producer.type", "1");

        // 1: async 2: sync

        ProducerConfig config = new ProducerConfig(props);

        Producer<String, String> producer = new Producer<String, String>(config);

        for (long nEvents = 0; nEvents < events; nEvents++) {

               long runtime = new Date().getTime();

               String ip = "192.168.2." + rnd.nextInt(255);

               String msg = runtime + ",www.example.com," + ip;

               KeyedMessage<String, String> data = new KeyedMessage<String, String>("jiketest", ip, msg);

               producer.send(data);

               try {

                   Thread.sleep(1000);

               } catch (InterruptedException ie) {

               }

        }

        producer.close();

    }

}

分区算法：

package kafka.producer.partiton;

import kafka.producer.Partitioner;

import kafka.utils.VerifiableProperties;

public class SimplePartitioner implements Partitioner {

    public SimplePartitioner (VerifiableProperties props) {

    }

    public int partition(Object key, int a_numPartitions) {

        int partition = 0;

        String stringKey = (String) key;

        int offset = stringKey.lastIndexOf('.');

        if (offset > 0) {

           partition = Integer.parseInt( stringKey.substring(offset+1)) % a_numPartitions;

        }

       return partition;

  }

}

二十二、Hadoop学记笔记————Kafka 基础实战：消费者和生产者实例

秒客网

二十二、Hadoop学记笔记————Kafka 基础实战：消费者和生产者实例

相关文章

二十二、Hadoop学记笔记————Kafka 基础实战 ：消费者和生产者实例

相关文章

二十二、Hadoop学记笔记————Kafka 基础实战：消费者和生产者实例