Hadoop源码解析之 rpc通信 client到server通信

rpc是Hadoop分布式底层通信的基础，无论是client和namenode，namenode和datanode，以及yarn新框架之间的通信模式等等都是采用的rpc方式。

下面我们来概要分析一下Hadoop2的rpc。

Hadoop通信模式主要是C/S方式，及客户端和服务端的模式。

客户端采用传统的socket通信方式向服务端发送信息，并等待服务端的返回。

服务端采用reactor的模式（Java nio）的方式来处理客户端的请求并给予响应。

一、客户端到服务端的通信

下面我们先分析客户端到服务端的通信。

要先通信，就要建立连接，建立连接就要发头消息。

客户端代码在Hadoop common中的ipc包里，主要类为client.java。负责通信的内部类是Client.Connection,Connection中包括以下几个属性

private InetSocketAddress server;// 连接服务端的地址

private final ConnectionId remoteId;//connection复用，此类是为了复用连接而创建的，在client类中有一个连接池属性Hashtable<ConnectionId, Connection> connections，此属性表示如果多个客户端来自同一个remoteID连接，如果connection没有关闭，那么就复用这个connection。那么如何判断是来自同一个ConnectionId呢，见下面的代码。

/**

*ConnectionId类重写了equals方法

*

**/

@Override

    public boolean equals(Object obj) {

      if (obj == this) {

        return true;

      }

      if (obj instanceof ConnectionId) {

        ConnectionId that = (ConnectionId) obj;

       //同一个远端服务地址，即要连接同一个服务端

        return isEqual(this.address, that.address)

            && this.doPing == that.doPing

            && this.maxIdleTime == that.maxIdleTime

            && isEqual(this.connectionRetryPolicy, that.connectionRetryPolicy)

            && this.pingInterval == that.pingInterval

            //同一个远程协议，像datanode与namenode，client与       //namenode等之间通信的时候都各自有自己的协议，

//如果不是同一个协议则使用不同的连接

            && isEqual(this.protocol, that.protocol)

            && this.rpcTimeout == that.rpcTimeout

            && this.tcpNoDelay == that.tcpNoDelay

            && isEqual(this.ticket, that.ticket);

      }

      return false;

    }

private DataInputStream in;//输入

private DataOutputStream out;//输出

private Hashtable<Integer, Call> calls = new Hashtable<Integer, Call>();//Call类是client的内部类，将客户端的请求，服务端的响应等信息封装成一个call类，在后面我们会详细分析此类。而calls属性是建立连接后进行的多次消息传送，也就是我们每次建立连接可能会在连接有效期间发送了多次请求。

说了这些属性的含义，那么是怎么和服务端建立连接的呢。看下面的代码解析

  private Connection getConnection(ConnectionId remoteId,

      Call call, int serviceClass, AtomicBoolean fallbackToSimpleAuth)

      throws IOException {

    //running是client的一个属性，表示客户端现在是否向服务端进行请求，如果没有running（running是一个AtomicBollean原子布尔类的对象）就是返回false

    if (!running.get()) {

      // the client is stopped

      throw new IOException("The client is stopped");

    }

    Connection connection;

    do {

      synchronized (connections) {

        //判断是否存在对应的连接没有则新建

        connection = connections.get(remoteId);

        if (connection == null) {

          connection = new Connection(remoteId, serviceClass);

          connections.put(remoteId, connection);

        }

      }

    //addCall中判断当获取取得接应该关闭了，则不能将call放到这个关闭的连接中

    } while (!connection.addCall(call));

    //进行输入输出对象初始化

    connection.setupIOstreams(fallbackToSimpleAuth);

    return connection;

  }

private synchronized boolean addCall(Call call) {

     //shouldCloseConnection也是connection类的属性，当连接异常，或者客户端要断开连接是，它返回false，说明这个连接正在回收中，不能继续使用。

      if (shouldCloseConnection.get())

        return false;

      calls.put(call.id, call);

      notify();

      return true;

    }

getConnection方法只是初始化了connection对象，并将要发送的请求call对象放入连接connection中，其实还并没有与客户端进行通信。开始通信的方法是setupIOstreams方法，此方法不仅建立与服务端通信的输入输出对象，还进行消息头的发送，判断能否与服务端进行连接，由于Hadoop有很多个版本，而且并不是每个版本之间都能进行完美通信的。所以不同版本是不能通信的，消息头就是负责这个任务的，消息头中也附带了，通信的协议，说明到底是谁和谁之间通信（是client和namenode还是datanode和namenode，还是yarn中的resourceManage 和nodemanage等等）。

//省略了部分代码

private synchronized void setupIOstreams(

        AtomicBoolean fallbackToSimpleAuth) {
      //在socket不为空的情况下就不用再初始化下面的内容了，这说明了，目前正在重用已有的connection，而shouldCloseConnection为true则表示当前的连接正要关闭状态，不可用因此下面的初始化也没有意义，要获取一个新      //的连接才可以

      if (socket != null || shouldCloseConnection.get()) {

        return;

      }

      try {

        if (LOG.isDebugEnabled()) {

          LOG.debug("Connecting to "+server);

        }

        if (Trace.isTracing()) {

          Trace.addTimelineAnnotation("IPC client connecting to " + server);

        }

        short numRetries = 0;

        Random rand = null;

        while (true) {

          //connection一些初始化信息，建立socket，初始socket等等操作

          setupConnection();

         //初始输入

          InputStream inStream = NetUtils.getInputStream(socket);

          //初始输出

          OutputStream outStream = NetUtils.getOutputStream(socket);

          //向服务端写消息头信息

          writeConnectionHeader(outStream);

          . . . . . .

          //向服务端写连接上下文，详见下面代码解析

          writeConnectionContext(remoteId, authMethod);

          //connection连接有一定的超时限制，touch方法进行时间更新将连接最新时间更新到现在。

          touch();

          if (Trace.isTracing()) {

            Trace.addTimelineAnnotation("IPC client connected to " + server);

          }

          // connection类继承自thread类，在其run方法中开始接收服务端的返回消息，详见下面run方法

          start();

          return;

        }

      } catch (Throwable t) {

        if (t instanceof IOException) {

          markClosed((IOException)t);

        } else {

          markClosed(new IOException("Couldn't set up IO streams", t));

        }

        //如果出现错误就关闭连接，

        close();

      }

    }

先来看一下client端发送的头消息以及连接上下文中都是什么。

在writeConnectionHeader(OutputStream outStream)方法中主要发送的信息是Hadoop魔数（hrpc），当前版本version，所应用的通信协议的类名和协议的callid。详见下面代码，内容很简单不做过多解释。

/**

     * Write the connection header - this is sent when connection is established

     * +----------------------------------+

     * |  "hrpc" 4 bytes                  |

     * +----------------------------------+

     * |  Version (1 byte)                |

     * +----------------------------------+

     * |  Service Class (1 byte)          |

     * +----------------------------------+

     * |  AuthProtocol (1 byte)           |

     * +----------------------------------+

     */

    private void writeConnectionHeader(OutputStream outStream)

        throws IOException {

      DataOutputStream out = new DataOutputStream(new BufferedOutputStream(outStream));

      // Write out the header, version and authentication method

      out.write(RpcConstants.HEADER.array());

      out.write(RpcConstants.CURRENT_VERSION);

      out.write(serviceClass);

      out.write(authProtocol.callId);

      out.flush();

    }

传完头信息就要继续传连接上下文，上下文信息主要是确定当前连接来自于那个客户端，正在处理的是当前客户端的那个call调用，等等信息以确保服务端能够准确的将应答消息发送给正确的客户端。详见代码解析

private void writeConnectionContext(ConnectionId remoteId,

                                        AuthMethod authMethod)

                                            throws IOException {

      // 建立上下文，依据协议名称，connectionId所属用户组

      IpcConnectionContextProto message = ProtoUtil.makeIpcConnectionContext(

          RPC.getProtocolName(remoteId.getProtocol()),

          remoteId.getTicket(),

          authMethod);

     //建立上下文头信息，包括RpcKind.RPC_PROTOCOL_BUFFER说明消息采用的序列化方式，CONNECTION_CONTEXT_CALL_ID应用的那个call，这里采用一个特殊的callId，CONNECTION_CONTEXT_CALL_ID=-3，表示是一个上下文信息，没有请求需要处理，RpcConstants.INVALID_RETRY_COUNT表示call的重试次数，远程调用肯定会出现调用失败，而失败可能是网络等问题，所以重试几次以确保最终能够获得返回结果，这里的RpcConstants.INVALID_RETRY_COUNT=-1，并不需要重试，因为没有请求需要处理，clientId顾名思义当前发出请求的客户端

      RpcRequestHeaderProto connectionContextHeader = ProtoUtil

          .makeRpcRequestHeader(RpcKind.RPC_PROTOCOL_BUFFER,

              OperationProto.RPC_FINAL_PACKET, CONNECTION_CONTEXT_CALL_ID,

              RpcConstants.INVALID_RETRY_COUNT, clientId);

      RpcRequestMessageWrapper request =

          new RpcRequestMessageWrapper(connectionContextHeader, message);

      // 写出消息到服务端，先写消息长度，然后是内容，这是固定的方式。

      out.writeInt(request.getLength());

      request.write(out);

    }

消息发送完毕就要等待回应，run方法不仅仅是对消息头发送出的信息的响应，他是对当前连接在有效期间所有请求的响应的一个接收端。

public void run() {

      if (LOG.isDebugEnabled())

        LOG.debug(getName() + ": starting, having connections "

            + connections.size());

      try {

       //waitForWork方法判断当前连接是否处于工作状态，

        while (waitForWork()) {//wait here for work - read or close connection

          //接受消息

          receiveRpcResponse();

        }

      } catch (Throwable t) {

        // This truly is unexpected, since we catch IOException in receiveResponse

        // -- this is only to be really sure that we don't leave a client hanging

        // forever.

        LOG.warn("Unexpected error reading responses on connection " + this, t);

        markClosed(new IOException("Error reading responses", t));

      }

      //connection已经关闭，进行连接回收，包括输入输出的回收将连接从连接池中清除等

      close();

      if (LOG.isDebugEnabled())

        LOG.debug(getName() + ": stopped, remaining connections "

            + connections.size());

    }

//接收服务端返回的信息

 private void receiveRpcResponse() {

      if (shouldCloseConnection.get()) {

        return;

      }

      touch();

      try {

        //对返回消息的处理，分布式消息的处理方式有很多种，一种是定长格式，一种是不定长，定长方式很容易理解，不定长中包含了消息的长度，在消息头处，则可以容易的读出消息准确长度，并进行处理。

        int totalLen = in.readInt();

        RpcResponseHeaderProto header =

            RpcResponseHeaderProto.parseDelimitedFrom(in);

        checkResponse(header);

        int headerLen = header.getSerializedSize();

        headerLen += CodedOutputStream.computeRawVarint32Size(headerLen);

        //每个连接中有很多个call，call类中有一个callId的属性，类似于mac地址在对应的集群中是唯一的，从而能让客户端和服务端能够准去的处理请求。

        int callId = header.getCallId();

        if (LOG.isDebugEnabled())

          LOG.debug(getName() + " got value #" + callId);

        //获取正在处理的call

        Call call = calls.get(callId);

        //处理状态，RpcStatusProto是一个枚举类，有三种状态成功，错误，连接关闭。

        RpcStatusProto status = header.getStatus();

        if (status == RpcStatusProto.SUCCESS) {

         //通过反射方式获取返回的消息值

          Writable value = ReflectionUtils.newInstance(valueClass, conf);

          value.readFields(in);                 // read value

         //处理完成后将call从calls中删除掉

          calls.remove(callId);

          //将返回值放到client的结果值中

          call.setRpcResponse(value);

          // verify that length was correct

          // only for ProtobufEngine where len can be verified easily

          if (call.getRpcResponse() instanceof ProtobufRpcEngine.RpcWrapper) {

            ProtobufRpcEngine.RpcWrapper resWrapper =

                (ProtobufRpcEngine.RpcWrapper) call.getRpcResponse();

            if (totalLen != headerLen + resWrapper.getLength()) {

              throw new RpcClientException(

                  "RPC response length mismatch on rpc success");

            }

          }

        } else { // Rpc Request failed

          // Verify that length was correct

          if (totalLen != headerLen) {

            throw new RpcClientException(

                "RPC response length mismatch on rpc error");

          }

          final String exceptionClassName = header.hasExceptionClassName() ?

                header.getExceptionClassName() :

                  "ServerDidNotSetExceptionClassName";

          final String errorMsg = header.hasErrorMsg() ?

                header.getErrorMsg() : "ServerDidNotSetErrorMsg" ;

          final RpcErrorCodeProto erCode =

                    (header.hasErrorDetail() ? header.getErrorDetail() : null);

          if (erCode == null) {

             LOG.warn("Detailed error code not set by server on rpc error");

          }

          RemoteException re =

              ( (erCode == null) ?

                  new RemoteException(exceptionClassName, errorMsg) :

              new RemoteException(exceptionClassName, errorMsg, erCode));

          if (status == RpcStatusProto.ERROR) {

            calls.remove(callId);

            call.setException(re);

          } else if (status == RpcStatusProto.FATAL) {

            // Close the connection

            markClosed(re);

          }

        }

      } catch (IOException e) {

        markClosed(e);

      }

    }

//此方法是call中的

public synchronized void setRpcResponse(Writable rpcResponse) {

     //将结果值放到返回值中

     this.rpcResponse = rpcResponse;

     //当前call已处理完毕，

      callComplete();

    }

//此方法是call中的

protected synchronized void callComplete() {

     //done=true表示此call已经处理完成

      this.done = true;

     //在处理call的时候采用的是同步处理方案，所有处理完后要唤醒wait端，

      notify();                                 // notify caller

    }

下面我们来讲一下Client.Call这个类

Call是对消息的一个封装。包括以下属性

final int id; // call id
final int retry; // call重试次数　
final Writable rpcRequest; // 序列化的rpc请求
Writable rpcResponse; // 序列化的返回响应，如果有错误则是null，即Nullwritable
IOException error; // 处理中的异常
final RPC.RpcKind rpcKind; // rpc引擎采用的种类，主要有writable引擎方式，和protocolbuffer引擎方式，两种的序列化和rpc消息处理各不相同，writable是Hadoop创建之初自带的一种处理方式，protocolbuffer是google公司所采用的一种方式，目前Hadoop默认的采用方式是protocolbuffer方式，主要是平台化和速度上都要胜于writalble。
boolean done; // true表示call已完成，判断call完成与否的依据

Call类的主要方法在上面已经提到过，可以返回上面回顾一下。

上面分析了client端是如何处理连接，那么我们什么时候会建立client端对象，以及如何发送正式的消息内容呢?那我们就接下来继续分析。

其实客户端和服务端之间的通信依赖于Java内部的动态代理方式。

主要代理的就是协议代理，Hadoop内的所有协议都实现自VesionedProtocol接口，主要有两个方法，getProtocolVersion判断协议的版本，getProtocolSignature对协议的认证，认证就是判断客户端发送的协议服务端有没有对应的实现等等信息。

client端通过协议发送的请求都要经过代理对象，代理对象invoke方法会在发送请求是建立一个invocation类的对象（在writable引擎中是这样，protocolbuffer引擎中则比较复杂），所有的请求都要经过这个对象打包发送到server端，server端接收到请求后将消息转化成对应的invocation对象处理。详细解析看下面代码。

//客户端或者datanode等在开始发送请求通信时，会调用RPC类中的getProxy方法，这个方法用很多个重载方法，最终会调用下面的方法

public static <T> ProtocolProxy<T> getProtocolProxy(Class<T> protocol,

                                long clientVersion,

                                InetSocketAddress addr,

                                UserGroupInformation ticket,

                                Configuration conf,

                                SocketFactory factory,

                                int rpcTimeout,

                                RetryPolicy connectionRetryPolicy,

                                AtomicBoolean fallbackToSimpleAuth)

       throws IOException {

    if (UserGroupInformation.isSecurityEnabled()) {

      SaslRpcServer.init(conf);

    }

    //最终获取对应RPC引擎的代理对象。

    return getProtocolEngine(protocol, conf).getProxy(protocol, clientVersion,

        addr, ticket, conf, factory, rpcTimeout, connectionRetryPolicy,

        fallbackToSimpleAuth);

  }

//在protobufRpcEngine中的实现如下

public <T> ProtocolProxy<T> getProxy(Class<T> protocol, long clientVersion,

      InetSocketAddress addr, UserGroupInformation ticket, Configuration conf,

      SocketFactory factory, int rpcTimeout, RetryPolicy connectionRetryPolicy,

      AtomicBoolean fallbackToSimpleAuth) throws IOException {

    //Invoker实现了InvocationHandler 最后的invoke方法就在此类中

    final Invoker invoker = new Invoker(protocol, addr, ticket, conf, factory,

        rpcTimeout, connectionRetryPolicy, fallbackToSimpleAuth);

        //这是我们非常熟悉的动态代理的创建方式

    return new ProtocolProxy<T>(protocol, (T) Proxy.newProxyInstance(

        protocol.getClassLoader(), new Class[]{protocol}, invoker), false);

  }

在invoker中的invoke方法中处理client端的请求

 @Override

    public Object invoke(Object proxy, Method method, Object[] args)

        throws ServiceException {
      ...

      //请求头信息

      RequestHeaderProto rpcRequestHeader = constructRpcRequestHeader(method);

      ...
      //请求包裹在参数中

      Message theRequest = (Message) args[1];

      final RpcResponseWrapper val;

      try {

        //调用C/S中的client端的call方法处理请求

        val = (RpcResponseWrapper) client.call(RPC.RpcKind.RPC_PROTOCOL_BUFFER,

            new RpcRequestWrapper(rpcRequestHeader, theRequest), remoteId,

            fallbackToSimpleAuth);

      } catch (Throwable e) {

        if (LOG.isTraceEnabled()) {

          LOG.trace(Thread.currentThread().getId() + ": Exception <- " +

              remoteId + ": " + method.getName() +

                " {" + e + "}");

        }

        if (Trace.isTracing()) {

          traceScope.getSpan().addTimelineAnnotation(

              "Call got exception: " + e.getMessage());

        }

        throw new ServiceException(e);

      } finally {

        if (traceScope != null) traceScope.close();

      }

      if (LOG.isDebugEnabled()) {

        long callTime = Time.now() - startTime;

        LOG.debug("Call: " + method.getName() + " took " + callTime + "ms");

      }

      Message prototype = null;

      try {
        //获取协议类型，

        prototype = getReturnProtoType(method);

      } catch (Exception e) {

        throw new ServiceException(e);

      }

      Message returnMessage;

      try {

       //通过client call返回的结果构造最终的返回值。

        returnMessage = prototype.newBuilderForType()

            .mergeFrom(val.theResponseRead).build();

        if (LOG.isTraceEnabled()) {

          LOG.trace(Thread.currentThread().getId() + ": Response <- " +

              remoteId + ": " + method.getName() +

                " {" + TextFormat.shortDebugString(returnMessage) + "}");

        }

      } catch (Throwable e) {

        throw new ServiceException(e);

      }

      return returnMessage;

    }

下面就看一下client 中的call方法做了些什么

public Writable call(RPC.RpcKind rpcKind, Writable rpcRequest,

      ConnectionId remoteId, int serviceClass,

      AtomicBoolean fallbackToSimpleAuth) throws IOException {

    //根据引擎种类，请求消息建立call对象

    final Call call = createCall(rpcKind, rpcRequest);

    //上面分析过的getConnection方法，获取一个连接

    Connection connection = getConnection(remoteId, call, serviceClass,

      fallbackToSimpleAuth);

    try {

      //通过连接开始发送消息给服务端，详见下面代码解析

      connection.sendRpcRequest(call);                 // send the rpc request

    } catch (RejectedExecutionException e) {

      throw new IOException("connection has been closed", e);

    } catch (InterruptedException e) {

      Thread.currentThread().interrupt();

      LOG.warn("interrupted waiting to send rpc request to server", e);

      throw new IOException(e);

    }

    boolean interrupted = false;

    //采用同步阻塞方式，直到此call得到了对应的应答，然后对应答消息进行处理。

    synchronized (call) {

      while (!call.done) {

        try {

         //对应callComplete方法中的notify调用。

          call.wait();                           // wait for the result

        } catch (InterruptedException ie) {

          // save the fact that we were interrupted

          interrupted = true;

        }

      }

      if (interrupted) {

        // set the interrupt flag now that we are done waiting

        Thread.currentThread().interrupt();

      }

      if (call.error != null) {

        if (call.error instanceof RemoteException) {

          call.error.fillInStackTrace();

          throw call.error;

        } else { // local exception

          InetSocketAddress address = connection.getRemoteAddress();

          throw NetUtils.wrapException(address.getHostName(),

                  address.getPort(),

                  NetUtils.getHostname(),

                  0,

                  call.error);

        }

      } else {

        //处理正确将应答消息返回上面的invoke方法中

        return call.getRpcResponse();

      }

    }

  }

下面看看是如何发送请求消息的

public void sendRpcRequest(final Call call)

        throws InterruptedException, IOException {

     //判断连接是否关闭

      if (shouldCloseConnection.get()) {

        return;

      }

      // Serialize the call to be sent. This is done from the actual

      // caller thread, rather than the sendParamsExecutor thread,

      // so that if the serialization throws an error, it is reported

      // properly. This also parallelizes the serialization.

      //

      // Format of a call on the wire:

      // 0) Length of rest below (1 + 2)

      // 1) RpcRequestHeader  - is serialized Delimited hence contains length

      // 2) RpcRequest

      //

      // Items '1' and '2' are prepared here. 

      //创建输出缓冲准备将请求信息输出到服务端

      final DataOutputBuffer d = new DataOutputBuffer();
      //拼装请求消息的头信息

      RpcRequestHeaderProto header = ProtoUtil.makeRpcRequestHeader(

          call.rpcKind, OperationProto.RPC_FINAL_PACKET, call.id, call.retry,

          clientId);
      //将头消息放入缓冲区

      header.writeDelimitedTo(d);
      //将请求正文放入缓冲区

      call.rpcRequest.write(d);

      //采用同步方式发送消息，不然消息之间交叉重叠无法读取

      synchronized (sendRpcRequestLock) {
        //启动发送线程发送消息，sederFuture等待响应

        Future<?> senderFuture = sendParamsExecutor.submit(new Runnable() {

          @Override

          public void run() {

            try {

              synchronized (Connection.this.out) {

                if (shouldCloseConnection.get()) {

                  return;

                }

                if (LOG.isDebugEnabled())

                  LOG.debug(getName() + " sending #" + call.id);

                byte[] data = d.getData();

                int totalLength = d.getLength();

                //先写出消息的长度，在写出消息的内容。
                out.writeInt(totalLength); // Total Length

                out.write(data, 0, totalLength);// RpcRequestHeader + RpcRequest

                out.flush();

              }

            } catch (IOException e) {

              // exception at this point would leave the connection in an

              // unrecoverable state (eg half a call left on the wire).

              // So, close the connection, killing any outstanding calls

              markClosed(e);

            } finally {

              //the buffer is just an in-memory buffer, but it is still polite to

              // close early 关闭流和缓冲区

              IOUtils.closeStream(d);

            }

          }

        });

        try {
          //等待返回结果，真正返回结果是放到call中的RPCResponse属性值，是通过connection的run（方法上面有详解）获取的run方法一直处于轮询状态，直到连接关闭或出现异常等现象才结束，这里的get只是阻塞等待消息成          //功发送为止。

          senderFuture.get();

        } catch (ExecutionException e) {

          Throwable cause = e.getCause();

          // cause should only be a RuntimeException as the Runnable above

          // catches IOException

          if (cause instanceof RuntimeException) {

            throw (RuntimeException) cause;

          } else {

            throw new RuntimeException("unexpected checked exception", cause);

          }

        }

      }

    }

以上就是client端到服务端rpc连接及发送消息的全部内容。下一节将会分析server端到client端的rpc连接方式节消息接受处理和发送方式。