chromium中DNS主机地址解析过程——基于系统函数查询的方式

时间:2022-09-03 21:52:32

使用情景

前面我们说过FTP协议的网络资源加载,其中在加载FTP资源的开始就要进行地址解析,对于ftp来说,它处理的状态是STATE_CTRL_RESOLVE_HOST。

现在我们来分析一下chromium中是如何解析主机地址的,其代码主要位于net\dns中。
对于dns模块来说,对外的主要接口是HostResolverImpl::Resolve,我们从这个接口切入,来了解dns解析过程。

上下文调用示例

我们启动windbg开始调试chromium,加载好调试符号,在HostResolverImpl::Resolve函数上下断,运行,使得chromium随意一个主机地址解析产生一个中断。当然,网络资源的请求和加载都是在browser中进行的,我们在主进程中下断点。

0:000> bp chrome_7feee500000!net::HostResolverImpl::Resolve

如下是中断后的堆栈情况。

0:029> kc
# Call Site
00 chrome_7feee500000!net::HostResolverImpl::Resolve
01 chrome_7feee500000!net::SingleRequestHostResolver::Resolve
02 chrome_7feee500000!chrome_browser_net::Predictor::LookupRequest::Start
03 chrome_7feee500000!chrome_browser_net::Predictor::StartSomeQueuedResolutions
04 chrome_7feee500000!chrome_browser_net::Predictor::AppendToResolutionQueue
05 chrome_7feee500000!chrome_browser_net::Predictor::ResolveList
06 chrome_7feee500000!chrome_browser_net::Predictor::DnsPrefetchMotivatedList
07 chrome_7feee500000!chrome_browser_net::Predictor::FinalizeInitializationOnIOThread
08 chrome_7feee500000!base::internal::RunnableAdapter<void (__cdecl disk_cache::SimpleSynchronousEntry::*)(disk_cache::SimpleSynchronousEntry::EntryOperationData const &,net::IOBuffer *,base::Time *,int *)>::Run
09 chrome_7feee500000!base::internal::InvokeHelper<0,void,base::internal::RunnableAdapter<void (__cdecl disk_cache::SimpleSynchronousEntry::*)(disk_cache::SimpleSynchronousEntry::EntryOperationData const &,net::IOBuffer *,base::Time *,int *)> >::MakeItSo
0a chrome_7feee500000!base::internal::Invoker<base::IndexSequence<0,1,2,3,4>,base::internal::BindState<base::internal::RunnableAdapter<void (__cdecl disk_cache::SimpleSynchronousEntry::*)(disk_cache::SimpleSynchronousEntry::EntryOperationData const & __ptr64,net::IOBuffer * __ptr64,base::Time * __ptr64,int * __ptr64) __ptr64>,void __cdecl(disk_cache::SimpleSynchronousEntry * __ptr64,disk_cache::SimpleSynchronousEntry::EntryOperationData const & __ptr64,net::IOBuffer * __ptr64,base::Time * __ptr64,int * __ptr64),base::internal::UnretainedWrapper<disk_cache::SimpleSynchronousEntry>,disk_cache::SimpleSynchronousEntry::EntryOperationData,scoped_refptr<net::IOBuffer>,base::Time * __ptr64,int * __ptr64>,base::internal::InvokeHelper<0,void,base::internal::RunnableAdapter<void (__cdecl disk_cache::SimpleSynchronousEntry::*)(disk_cache::SimpleSynchronousEntry::EntryOperationData const & __ptr64,net::IOBuffer * __ptr64,base::Time * __ptr64,int * __ptr64) __ptr64> >,void __cdecl(void)>::Run
0b chrome_7feee500000!base::Callback<void __cdecl(void)>::Run
0c chrome_7feee500000!base::debug::TaskAnnotator::RunTask
0d chrome_7feee500000!base::MessageLoop::RunTask
0e chrome_7feee500000!base::MessageLoop::DeferOrRunPendingTask
0f chrome_7feee500000!base::MessageLoop::DoWork
10 chrome_7feee500000!base::MessagePumpForIO::DoRunLoop
11 chrome_7feee500000!base::MessagePumpWin::Run
12 chrome_7feee500000!base::MessageLoop::RunHandler
13 chrome_7feee500000!base::RunLoop::Run
14 chrome_7feee500000!base::MessageLoop::Run
15 chrome_7feee500000!base::Thread::Run
16 chrome_7feee500000!content::BrowserThreadImpl::IOThreadRun
17 chrome_7feee500000!content::BrowserThreadImpl::Run
18 chrome_7feee500000!base::Thread::ThreadMain
19 chrome_7feee500000!base::`anonymous namespace'::ThreadFunc
1a kernel32!BaseThreadInitThunk
1b ntdll!RtlUserThreadStart

然后显示出0号堆栈的局部变量的相关情况,查看一下内部的相关信息。这里省略了部分多余的信息

0:029> ~~[b2c]s;.frame 0n0;dv /t /v
chrome_7feee500000!net::HostResolverImpl::Resolve:
000007fe`ef17b5f4 4055 push rbp
00 00000000`0920dcc8 000007fe`ef1c4ae7 chrome_7feee500000!net::HostResolverImpl::Resolve [c:\b\build\slave\win64\build\src\net\dns\host_resolver_impl.cc @ 1930]
@rcx class net::HostResolverImpl * this = 0x00000000`07225280
@rdx class net::HostResolver::RequestInfo * info = 0x00000000`0920dde0
@r8d net::RequestPriority priority = LOWEST (0n1)
@r9 class net::AddressList * addresses = 0x00000000`0a7a2030
.......

0:029> dx -id 0,0 -r1 (*((chrome_7feee500000!net::HostResolver::RequestInfo *)0x920dde0))
(*((chrome_7feee500000!net::HostResolver::RequestInfo *)0x920dde0)) [Type: net::HostResolver::RequestInfo]
[+0x000] host_port_pair_ [Type: net::HostPortPair]
[+0x028] address_family_ : ADDRESS_FAMILY_UNSPECIFIED (0) [Type: net::AddressFamily]
[+0x02c] host_resolver_flags_ : 0
[+0x030] allow_cached_response_ : true
[+0x031] is_speculative_ : true
[+0x032] is_my_ip_address_ : false
0:029> dx -id 0,0 -r1 (*((chrome_7feee500000!net::HostPortPair *)0x920dde0))
(*((chrome_7feee500000!net::HostPortPair *)0x920dde0)) [Type: net::HostPortPair]
[+0x000] host_ : "www.baidu.com" [Type: std::basic_string<char,std::char_traits<char>,std::allocator<char> >]
[+0x020] port_ : 0x50

当我们查阅相关信息的时候发现,RequestInfo 是我们要向dns查询的目标,我们查询的是www.baidu.com的主机地址,端口号是50.

源码分析

我们来看一下这个分析的源码,函数中有几个参数,第一个是请求的目标,这个我们在上面已经看到了,然后是请求的优先级,之后是返回地址,接着是完成回调函数,因为解析是异步的,还有一个参数是输出句柄,使得后期可以操作这个查询请求,最后是网络日志相关的打印。

Resolve

主函数

int HostResolverImpl::Resolve(const RequestInfo& info,
RequestPriority priority,
AddressList* addresses,
const CompletionCallback& callback,
RequestHandle* out_req,
const BoundNetLog& source_net_log) {

.....
Key key = GetEffectiveKeyForRequest(info, ip_number_ptr, source_net_log);

int rv = ResolveHelper(key, info, ip_number_ptr, addresses, source_net_log);
if (rv != ERR_DNS_CACHE_MISS) {
LogFinishRequest(source_net_log, info, rv);
RecordTotalTime(HaveDnsConfig(), info.is_speculative(), base::TimeDelta());
return rv;
}

// Next we need to attach our request to a "job". This job is responsible for
// calling "getaddrinfo(hostname)" on a worker thread.

JobMap::iterator jobit = jobs_.find(key);
Job* job;
if (jobit == jobs_.end()) {
job =
new Job(weak_ptr_factory_.GetWeakPtr(), key, priority, source_net_log);
job->Schedule(false);

// Check for queue overflow.
if (dispatcher_->num_queued_jobs() > max_queued_jobs_) {
Job* evicted = static_cast<Job*>(dispatcher_->EvictOldestLowest());
DCHECK(evicted);
evicted->OnEvicted(); // Deletes |evicted|.
if (evicted == job) {
rv = ERR_HOST_RESOLVER_QUEUE_TOO_LARGE;
LogFinishRequest(source_net_log, info, rv);
return rv;
}
}
jobs_.insert(jobit, std::make_pair(key, job));
} else {
job = jobit->second;
}

// Can't complete synchronously. Create and attach request.
scoped_ptr<Request> req(new Request(
source_net_log, info, priority, callback, addresses));
if (out_req)
*out_req = reinterpret_cast<RequestHandle>(req.get());

job->AddRequest(std::move(req));
// Completion happens during Job::CompleteRequests().
return ERR_IO_PENDING;
}

Key

函数中首先根据请求目标计算出key值。其实这个key的定义比较简单。

typedef HostCache::Key Key;

struct Key {
Key(const std::string& hostname, AddressFamily address_family,
HostResolverFlags host_resolver_flags)
: hostname(hostname),
address_family(address_family),
host_resolver_flags(host_resolver_flags) {}

bool operator<(const Key& other) const {
// The order of comparisons of |Key| fields is arbitrary, thus
// |address_family| and |host_resolver_flags| are compared before
// |hostname| under assumption that integer comparisons are faster than
// string comparisons.
return std::tie(address_family, host_resolver_flags, hostname) <
std::tie(other.address_family, other.host_resolver_flags,
other.hostname);
}

std::string hostname;
AddressFamily address_family;
HostResolverFlags host_resolver_flags;
};

其实这个key是一个HostCache中的一个结构,这个结构就是将请求里面的主要信息包装一下,主机名,地址族类型,以及解析标志。使用这个key来唯一标志dns中一种主机解析job。我们可以调试一下看一下数据.

0:029> dx -id 0,0 -r1 (*((chrome_7feee500000!net::HostCache::Key *)0x920dbf8))
(*((chrome_7feee500000!net::HostCache::Key *)0x920dbf8)) [Type: net::HostCache::Key]
[+0x000] hostname : "www.baidu.com" [Type: std::basic_string<char,std::char_traits<char>,std::allocator<char> >]
[+0x020] address_family : ADDRESS_FAMILY_UNSPECIFIED (0) [Type: net::AddressFamily]
[+0x024] host_resolver_flags : 0

ResolveHelper

这是一个解析的辅助函数。

int HostResolverImpl::ResolveHelper(const Key& key,
const RequestInfo& info,
const IPAddressNumber* ip_number,
AddressList* addresses,
const BoundNetLog& source_net_log) {
// The result of |getaddrinfo| for empty hosts is inconsistent across systems.
// On Windows it gives the default interface's address, whereas on Linux it
// gives an error. We will make it fail on all platforms for consistency.
if (info.hostname().empty() || info.hostname().size() > kMaxHostLength)
return ERR_NAME_NOT_RESOLVED;

int net_error = ERR_UNEXPECTED;
if (ResolveAsIP(key, info, ip_number, &net_error, addresses))
return net_error;
if (ServeFromCache(key, info, &net_error, addresses)) {
source_net_log.AddEvent(NetLog::TYPE_HOST_RESOLVER_IMPL_CACHE_HIT);
return net_error;
}
// TODO(szym): Do not do this if nsswitch.conf instructs not to.
// http://crbug.com/117655
if (ServeFromHosts(key, info, addresses)) {
source_net_log.AddEvent(NetLog::TYPE_HOST_RESOLVER_IMPL_HOSTS_HIT);
return OK;
}

if (ServeLocalhost(key, info, addresses))
return OK;

return ERR_DNS_CACHE_MISS;
}

函数中首先将请求视为IP地址,来判断是否可以,然后从Cache中查询,接着从Hosts文件中查询,最后查看是否是本机地址。
如果这一切都没有结果,那么我们回到Resolve函数中。

HostResolverImpl::Job

创建job

我们通过key检索Job,看看之前是否已经存在这个解析任务了。

  // Map from HostCache::Key to a Job.
JobMap jobs_;

typedef std::map<Key, Job*> JobMap;

在这里是一个key Job 相关的map数据。
* 如果我们检索到Job,那么我们提取出指针,然后向job中增加新的请求。
* 如果我们没有检索到job,那么我们创建一个新的job,并将key,以及优先级等重要信息传递给它。然后调度这个job

调度job

  void Schedule(bool at_head) {
DCHECK(!is_queued());
PrioritizedDispatcher::Handle handle;
if (!at_head) {
handle = resolver_->dispatcher_->Add(this, priority());
} else {
handle = resolver_->dispatcher_->AddAtHead(this, priority());
}
// The dispatcher could have started |this| in the above call to Add, which
// could have called Schedule again. In that case |handle| will be null,
// but |handle_| may have been set by the other nested call to Schedule.
if (!handle.is_null()) {
DCHECK(handle_.is_null());
handle_ = handle;
}
}

PrioritizedDispatcher::Handle PrioritizedDispatcher::Add(
Job* job, Priority priority) {
DCHECK(job);
DCHECK_LT(priority, num_priorities());
if (num_running_jobs_ < max_running_jobs_[priority]) {
++num_running_jobs_;
job->Start();
return Handle();
}
return queue_.Insert(job, priority);
}

从函数中我们可以获知,如果当前正在运行的job没有超过上限,那么开始运行这个job,否则将job加入到队列中。

0:029> dv /t 
class net::PrioritizedDispatcher * this = 0x00000000`07225370
class net::PrioritizedDispatcher::Job * job = 0x00000000`0a7a3dc0
unsigned int priority = 1
0:029> dx -id 0,0 -r1 (*((chrome_7feee500000!net::PrioritizedDispatcher *)0x7225370))
(*((chrome_7feee500000!net::PrioritizedDispatcher *)0x7225370)) [Type: net::PrioritizedDispatcher]
[+0x000] queue_ [Type: net::PriorityQueue<net::PrioritizedDispatcher::Job *>]
[+0x020] max_running_jobs_ : { size=5 } [Type: std::vector<unsigned __int64,std::allocator<unsigned __int64> >]
[+0x038] num_running_jobs_ : 0x1
0:029> dx -id 0,0 -r1 (*((chrome_7feee500000!std::vector<unsigned __int64,std::allocator<unsigned __int64> > *)0x7225390))
(*((chrome_7feee500000!std::vector<unsigned __int64,std::allocator<unsigned __int64> > *)0x7225390)) : { size=5 } [Type: std::vector<unsigned __int64,std::allocator<unsigned __int64> >]
[<Raw View>] [Type: std::vector<unsigned __int64,std::allocator<unsigned __int64> >]
[capacity] : 5
[0] : 0x6
[1] : 0x6
[2] : 0x6
[3] : 0x6
[4] : 0x6
0:029> dx -id 0,0 -r1 (*((chrome_7feee500000!net::PriorityQueue<net::PrioritizedDispatcher::Job *> *)0x7225370))
(*((chrome_7feee500000!net::PriorityQueue<net::PrioritizedDispatcher::Job *> *)0x7225370)) [Type: net::PriorityQueue<net::PrioritizedDispatcher::Job *>]
[+0x000] lists_ : { size=5 } [Type: std::vector<std::list<net::PrioritizedDispatcher::Job *,std::allocator<net::PrioritizedDispatcher::Job *> >,std::allocator<std::list<net::PrioritizedDispatcher::Job *,std::allocator<net::PrioritizedDispatcher::Job *> > > >]
[+0x018] size_ : 0x0
0:029> dx -id 0,0 -r1 (*((chrome_7feee500000!std::vector<std::list<net::PrioritizedDispatcher::Job *,std::allocator<net::PrioritizedDispatcher::Job *> >,std::allocator<std::list<net::PrioritizedDispatcher::Job *,std::allocator<net::PrioritizedDispatcher::Job *> > > > *)0x7225370))
(*((chrome_7feee500000!std::vector<std::list<net::PrioritizedDispatcher::Job *,std::allocator<net::PrioritizedDispatcher::Job *> >,std::allocator<std::list<net::PrioritizedDispatcher::Job *,std::allocator<net::PrioritizedDispatcher::Job *> > > > *)0x7225370)) : { size=5 } [Type: std::vector<std::list<net::PrioritizedDispatcher::Job *,std::allocator<net::PrioritizedDispatcher::Job *> >,std::allocator<std::list<net::PrioritizedDispatcher::Job *,std::allocator<net::PrioritizedDispatcher::Job *> > > >]
[<Raw View>] [Type: std::vector<std::list<net::PrioritizedDispatcher::Job *,std::allocator<net::PrioritizedDispatcher::Job *> >,std::allocator<std::list<net::PrioritizedDispatcher::Job *,std::allocator<net::PrioritizedDispatcher::Job *> > > >]
[capacity] : 5
[0] : { size=0x0 } [Type: std::list<net::PrioritizedDispatcher::Job *,std::allocator<net::PrioritizedDispatcher::Job *> >]
[1] : { size=0x0 } [Type: std::list<net::PrioritizedDispatcher::Job *,std::allocator<net::PrioritizedDispatcher::Job *> >]
[2] : { size=0x0 } [Type: std::list<net::PrioritizedDispatcher::Job *,std::allocator<net::PrioritizedDispatcher::Job *> >]
[3] : { size=0x0 } [Type: std::list<net::PrioritizedDispatcher::Job *,std::allocator<net::PrioritizedDispatcher::Job *> >]
[4] : { size=0x0 } [Type: std::list<net::PrioritizedDispatcher::Job *,std::allocator<net::PrioritizedDispatcher::Job *> >]

我们运行到这里,看一下动态调试的数据,看看实际运行的结果,我们可以获知,目前最大运行job数是6,分5个优先级,每个优先级数目相同,而队列情况也是空,目前队列中没有要处理的请求。

启动job

  // PriorityDispatch::Job:
void Start() override {
......
bool system_only =
(key_.host_resolver_flags & HOST_RESOLVER_SYSTEM_ONLY) != 0;

// Caution: Job::Start must not complete synchronously.
if (!system_only && had_dns_config_ &&
!ResemblesMulticastDNSName(key_.hostname)) {
StartDnsTask();
} else {
StartProcTask();
}
}

job的启动主要是启动任务,这里有两种方法,一种是使用dns客户端,一种是使用系统函数,如果解析的时候没有指定使用系统函数,并且dns客户端已经完全配置,并且主机名不是以”.local” “.local.”结束的时候,使用dns协议的方式来启动任务。否则使用系统函数来解析。

// HostResolverFlags is a bitflag enum used by host resolver procedures to
// determine the value of addrinfo.ai_flags and work around getaddrinfo
// peculiarities.
enum {
HOST_RESOLVER_CANONNAME = 1 << 0, // AI_CANONNAME
// Hint to the resolver proc that only loopback addresses are configured.
HOST_RESOLVER_LOOPBACK_ONLY = 1 << 1,
// Indicate the address family was set because no IPv6 support was detected.
HOST_RESOLVER_DEFAULT_FAMILY_SET_DUE_TO_NO_IPV6 = 1 << 2,
// The resolver should only invoke getaddrinfo, not DnsClient.
HOST_RESOLVER_SYSTEM_ONLY = 1 << 3
};

上面显示的是主机解析flags。在作者电脑上目前的调试会话中使用的系统函数请求的方式,所以我们先介绍StartProcTask。

  void StartProcTask() {
DCHECK(!is_dns_running());
proc_task_ = new ProcTask(
key_,
resolver_->proc_params_,
base::Bind(&Job::OnProcTaskComplete, base::Unretained(this),
base::TimeTicks::Now()),
net_log_);

if (had_non_speculative_request_)
proc_task_->set_had_non_speculative_request();
// Start() could be called from within Resolve(), hence it must NOT directly
// call OnProcTaskComplete, for example, on synchronous failure.
proc_task_->Start();
}

而在StartProcTask函数,其实内部是创建一个ProcTask,传递进去key和完成回调函数,然后让ProcTask去启动这个任务。

HostResolverImpl ProcTask

StartLookupAttempt

proc_task_->Start() 内部调用StartLookupAttempt

  void StartLookupAttempt() {
DCHECK(task_runner_->BelongsToCurrentThread());
base::TimeTicks start_time = base::TimeTicks::Now();
++attempt_number_;
// Dispatch the lookup attempt to a worker thread.
if (!base::WorkerPool::PostTask(
FROM_HERE,
base::Bind(&ProcTask::DoLookup, this, start_time, attempt_number_),
true)) {
NOTREACHED();

task_runner_->PostTask(FROM_HERE,
base::Bind(&ProcTask::OnLookupComplete,
this,
AddressList(),
start_time,
attempt_number_,
ERR_UNEXPECTED,
0));
return;
}

net_log_.AddEvent(NetLog::TYPE_HOST_RESOLVER_IMPL_ATTEMPT_STARTED,
NetLog::IntCallback("attempt_number", attempt_number_));

if (attempt_number_ <= params_.max_retry_attempts) {
task_runner_->PostDelayedTask(
FROM_HERE,
base::Bind(&ProcTask::RetryIfNotComplete, this),
params_.unresponsive_delay);
}
}

从函数中我们可以看出,首先递增尝试计数,然后将任务派发到WorkPool工作线程中,如果没有派遣成功的话,那么启动完成函数传递错误信息。然后接着判断是否尝试次数大于最大值,如果没有,那么我们派发一个延迟任务,看在规定的时间内是否解析完成,如果没有,那么回调我们的函数,通知没有成功解析,在RetryIfNotComplete函数中,我们根据一些条件来决定是否继续请求解析。

DoLookup

  void DoLookup(const base::TimeTicks& start_time,
const uint32_t attempt_number) {
AddressList results;
int os_error = 0;
// Running on the worker thread
int error = params_.resolver_proc->Resolve(key_.hostname,
key_.address_family,
key_.host_resolver_flags,
&results,
&os_error);

// Fail the resolution if the result contains 127.0.53.53. See the comment
// block of kIcanNameCollisionIp for details on why.
for (const auto& it : results) {
const IPAddressNumber& cur = it.address().bytes();
if (cur.size() == arraysize(kIcanNameCollisionIp) &&
0 == memcmp(&cur.front(), kIcanNameCollisionIp, cur.size())) {
error = ERR_ICANN_NAME_COLLISION;
break;
}
}

task_runner_->PostTask(FROM_HERE,
base::Bind(&ProcTask::OnLookupComplete,
this,
results,
start_time,
attempt_number,
error,
os_error));
}

因为是跨线程执行,单步操作我们不太容易执行到上面的函数中,那么我们在DoLookup上下断,然后运行,这样再次中断的时候就到了我们的DoLookup函数中了。
此函数运行在工作池内的线程中,这样做是为了不堵塞IO线程。使用resolver_proc来解析主机,然后判断结果,排除掉127.0.53.53地址,因为这是一个保留地址,然后派发完成任务。

HostResolverProc

最后,调用系统的解析函数。此类针对于系统函数做了一层封装

int SystemHostResolverCall(const std::string& host,
AddressFamily address_family,
HostResolverFlags host_resolver_flags,
AddressList* addrlist,
int* os_error) {
.....
if (os_error)
*os_error = 0;

struct addrinfo* ai = NULL;
struct addrinfo hints = {0};
......
#if defined(OS_POSIX) && !defined(OS_MACOSX) && !defined(OS_OPENBSD) && \
!defined(OS_ANDROID)
DnsReloaderMaybeReload();
#endif
int err = getaddrinfo(host.c_str(), NULL, &hints, &ai);

......
*addrlist = AddressList::CreateFromAddrinfo(ai);
freeaddrinfo(ai);
return OK;
}

系统函数使用getaddrinfo来获取地址信息,并把这个信息转换成AddressList。