翻译:三种类型的缓存

时间:2022-10-13 17:37:09

原文:The Three Types of Cache | Robust Perception 

The Three Types of Cache
三种类型的缓存
Brian Brazil August 12, 2015

Caches are a common feature of distributed systems, often added to improve performance. There are three main types of cache, and knowing about them will help you design robust systems.
缓存是分布式系统的一个常见特性,通常是为了提高性能而添加的。主要有三种类型的缓存,了解它们将有助于设计健壮的系统。

A cache is a subset of your data that remembers values you’re previously fetched or calculated from somewhere further away or more expensive. The next time around, if you’re looking for the same piece of data again then you can get it from the cache which is called a “cache hit”. If the data you want isn’t in the cache that’s called a “cache miss”, and you have to fetch or calculate it from the source.
缓存是一个数据的子集,数据是从某个较远的地方获取的或者经过代价昂贵的计算得来的。如果你下一次寻找相同的数据时,可以从缓存中获取得到它,这个过程被称为缓存命中(cache hit)。

When adding a cache it’s usually done for one of three reasons: latency, capacity or availability. Caches added for one purpose also have the behaviour of the others, so it’s important to be aware of their characteristics and how cache misses can cause unexpected problems.
在添加缓存时,通常会有以下三个原因:延迟、容量或可用性。为某个目的而添加的缓存也具有其他的行为,因此了解它们的特征和缓存未命中(cache misses)而导致的意外问题是很重要的。

Latency
延迟

Latency is one of the most common reasons a cache is added. Let’s say the majority of your requests are for only a small subset of your data on disk. You could wait 10ms every time for a disk seek, but if you could put the frequently accessed data in RAM and be many orders of magnitude faster for those requests.
延迟(Latency)是添加缓存的最常见的原因之一。假设大部分请求只针对磁盘上的一小部分数据。每次磁盘检索需要等待10ms,但是如果将频繁访问的数据放到内存中,访问内存的速度相对磁盘要快很多数量级。

A cache will make your average latency lower. Cache misses still take the same amount of time, so make sure your timeouts allow for the time a cache miss takes. In case there’s lots of misses, memory buffers and connection limits should allow for more in-progress requests.
缓存会降低平均延迟时间。发生缓存未命中(cache misses)时,请求仍然需要花费相同的时间,所以确保超时等待考虑到缓存未命中(cache misses)的情况。
 
Capacity
容量

Capacity is the other common reason for a cache being added. Continuing the above example, with 10ms per disk seek you can only service 100 requests per second. As most requests now only hit RAM, you can service many more requests per second with the same disk.
容量是添加缓存的另一个常见原因。接着上面的例子,每个磁盘检索耗费10ms,那么每秒只能服务100个请求。使用缓存后,由于大多数请求在内存中命中了攻击,所以可以使用相同的时间提供更多的请求。

This advantage of a capacity cache is also its biggest downside. If the cache get flushed such as by a machine reboot, you don’t get the capacity back until the cache has warmed back up. This can cause a problem called a thundering herd, where lots of requests (some of them for the same data) are all trying to use the disk beyond the disk’s capacity to serve them.
容量缓存的优点同时也是最大的缺点。如果缓存刷新了,比如机器重新启动,那么在缓存恢复之前,是无法加速访问的。这可能导致一个称为“惊群效应http://zdyi.com/books/unp/s2/2.6.2.html”的问题,其中许多请求(其中一些请求相同的数据)都试图使用磁盘以外的磁盘(use the disk beyond the disk’s capacity)来服务它们。

Ways of handling this include ensuring only a portion of the cache is flushed at once, having other servers able to handle the additional load if one or two go down and lose their cache, ensuring identical requests only result in one request to the disk, or load shedding where you drop requests on the floor rather than getting completely overloaded.
处理这中情况的方式有:确保仅含有缓存刷新的那一部分;如果一个或两个停机,或者缓存丢失,其他服务器能够处理额外的请求负载;确保相同的请求只会对磁盘发一个请求,或丢弃请求以达到分级削减负荷而不是完全超负荷的工作

Availability
可用性

Availability caches are less common. If the disk above was at risk of being removed, having a cache would mean you can still serve some requests even though all cache misses fail. This is often a consideration with network services, as for example being unable to talk to a remote server because DNS had a brief outage would rarely be a good tradeoff.
可用性缓存不太常见。如果磁盘有被删除的风险,那么这类缓存意味着即使所有的缓存都失效,仍然可以提供一些请求。这通常是考虑到网络服务的情况,例如,由于DNS有短暂的中断而导致的不能与远程服务器通信,然而这并没有多少价值(rarely be a good tradeoff)。

Serving data without talking to the source can lead to using stale data. Approaches for dealing with this include deleting entries from the cache after a time, regular refreshes, asynchronously updating the cache at each request and having a separate process that remove or updates the caches when values change.
在不与源数据通信的情况下提供数据会导致产生陈旧的数据。处理这类问题的方法包括:一次性删除缓存中的条目、定期刷新、异步更新每个请求的缓存,以及在值发生变化时删除或更新缓存。

Summary
总结

Caches can be a net gain in your systems when the appropriate considerations are taken into account. A caches added for one purpose will have all three properties:
仔细考虑适合的场景,缓存对系统赢得更多的附加值。添加的缓存主要考虑以下三个情况:

Latency: Reduces latency, cache misses still take the same time and resources
延迟:减少延迟,而缓存未命中的情况仍然需要相同的时间和资源
Capacity: Reduces resource needs, have to allow for cache suddenly being empty
容量:减少请求的资源,必须考虑到缓存突然清空的情况
Availability: Reduces downtime, data can be stale
可用性:减少停机时间,数据可能陈旧


In practice empty caches and stale data tend to be the most complex aspects to manage.
在实践中,空缓存和陈旧数据往往是管理的最复杂的方面。