如何使用STL实现LFU缓存?

时间:2022-09-02 10:59:23

I'm trying to implement LFU (Least Frequently Used) cache using pure STL (I don't want to use Boost!).

我正在尝试使用纯STL实现LFU(最不常用的)缓存(我不想使用Boost!)

Requirements are:

要求是:

  • Associative access to any element using a Key like with std::map.
  • 使用与std::map类似的键对任何元素的关联访问。
  • Ability to release the lowest priority item (using its UsesCount attribute).
  • 能够释放最低优先级项目(使用它的UsesCount属性)。
  • Ability to update priority (UsesCount) of any item.
  • 能够更新任何项目的优先级(UsesCount)。

The problems are:

的问题是:

  • If I use std::vector as container of items (Key, Value, UsesCount), std::map as a container of iterators to the vector for associative access and std::make_heap, std::push_heap and std::pop_heap as priority queue implementation within the vector, the itertors in the map are not valid after heap operations.
  • 如果我使用std::vector作为条目的容器(Key, Value, UsesCount), std::map作为迭代器的容器,以用于关联访问和std::make_heap, std::push_heap和std::pop_heap在vector中作为优先队列实现,在堆操作之后,映射中的迭代器是无效的。
  • If I use std::list (or std::map) instead of std::vector in the previous configuration, std::make_heap etc. can't be compiled becasue their iterators does not support aritmetic.
  • 如果我使用std::list(或std: map)而不是前面配置中的std:::vector, std::make_heap等无法编译,因为它们的迭代器不支持aritologic。
  • If I'd like to use std::priority_queue, I don't have ability to update item priority.
  • 如果我想使用std::priority_queue,我没有更新项目优先级的能力。

The questions are:

的问题是:

  • Am I missing something obvious how this problem could be solved?
  • 我是否遗漏了如何解决这个问题的一些显而易见的东西?
  • Can you point me to some pure C++/STL implementation of LFU cache meeting previous requirements as an example?
  • 您能给我举一个例子来说明LFU缓存的一些纯c++ /STL实现满足以前的需求吗?

Thank you for your insights.

谢谢你的真知灼见。

2 个解决方案

#1


2  

Your make implementation using the *_heap functions and a vector seems to be a good fit. although it will result in slow updates. The problem about iterator invalidation you encounter is normal for every container using a vector as an underlying data structure. This is the approach also taken by boost::heap::priority_queue, but it does not provide a mutable interface for the reason mentioned above. Other boost::heap data-structures offer the ability to update the heap.

使用*_heap函数和向量的make实现似乎是一个很好的选择。尽管它会导致缓慢的更新。对于使用向量作为底层数据结构的每个容器,遇到的迭代器失效问题都是正常的。这也是boost::heap::priority_queue所采用的方法,但是它并没有为上面提到的原因提供一个可变接口。其他改进::堆数据结构提供了更新堆的能力。

Something that seems a little odd: Even if you would be able to use std::priority_queue you will still face the iterator invalidation problem.

有点奇怪:即使您能够使用std::priority_queue,您仍然会遇到迭代器失效问题。

To answer your questions directly: You are not missing something obvious. std::priority_queue is not as useful as it should be. The best approach is to write your own heap implementation that supports updates . To make it fully STL compatible (especially allocator aware) is rather tricky and not a simple task. On top of that, implement the LFU cache.

直接回答你的问题:你没有错过一些明显的东西。priority_queue没有它应该的那么有用。最好的方法是编写支持更新的自己的堆实现。要使它完全兼容STL(特别是能够感知分配器)是相当困难的,而且不是一项简单的任务。除此之外,实现LFU缓存。

For the first step, look at the Boost implementations to get an idea of the effort. I'm not aware of any reference implementation for the second.

对于第一步,请查看Boost实现,以了解该工作的想法。我不知道关于第二个的任何参考实现。

To work around iterator invalidation you can always, choose indirection into another container, although you should try to avoid it as it creates an additional cost and can get quite messy.

要解决迭代器失效的问题,您始终可以选择将indirection转换为另一个容器,尽管您应该尽量避免它,因为它会产生额外的成本,并且可能会变得非常混乱。

#2


1  

A somewhat simpler approach than keeping two data structures:

一种比保留两个数据结构更简单的方法:

  • just keep a map, which maps your keys to their value/use-count pair.
  • 只需保存一个映射,它将键映射到它们的值/use-count对。
  • when the cache is full:
  • 当缓存满时:
    • create a vector of iterators to the map elements (O(n))
    • 创建映射元素的迭代器向量(O(n))
  • 创建映射元素的迭代器向量(O(n))
    • use std::nth_element to find the worst 10% (O(n))
    • 使用std::nth_element查找最差的10% (O(n))
  • 使用std::nth_element查找最差的10% (O(n))
    • remove them all from the map (O(n log n))
    • 将它们从地图上删除(O(n log n))
  • 将它们从地图上删除(O(n log n))

So, adding a new element to the cache is common case O(log n), worst case O(n log n), and amortized O(log n).

因此,向缓存中添加新元素是常见的O(log n)、最坏的O(n log n)和平摊的O(log n)。

Removing the worst 10% might be a bit drastic in a LFU cache, because new entries have to make the top 90% or they're cut. Then again, if you only remove one element then new entries still need to get off the bottom before the next new entry, or they're cut, and they have less time to do so. So depending why LFU is the right caching strategy for you, my change to it might be the wrong strategy, or it might still be fine.

删除最坏的10%在LFU缓存中可能有点极端,因为新的条目必须进入前90%,否则就会被删除。然后,如果您只删除了一个元素,那么新条目仍然需要在下一个新条目之前从底部删除,否则它们将被删除,而且它们这样做的时间更少。所以这取决于为什么LFU对您来说是正确的缓存策略,我对它的更改可能是错误的策略,或者它仍然是好的。

#1


2  

Your make implementation using the *_heap functions and a vector seems to be a good fit. although it will result in slow updates. The problem about iterator invalidation you encounter is normal for every container using a vector as an underlying data structure. This is the approach also taken by boost::heap::priority_queue, but it does not provide a mutable interface for the reason mentioned above. Other boost::heap data-structures offer the ability to update the heap.

使用*_heap函数和向量的make实现似乎是一个很好的选择。尽管它会导致缓慢的更新。对于使用向量作为底层数据结构的每个容器,遇到的迭代器失效问题都是正常的。这也是boost::heap::priority_queue所采用的方法,但是它并没有为上面提到的原因提供一个可变接口。其他改进::堆数据结构提供了更新堆的能力。

Something that seems a little odd: Even if you would be able to use std::priority_queue you will still face the iterator invalidation problem.

有点奇怪:即使您能够使用std::priority_queue,您仍然会遇到迭代器失效问题。

To answer your questions directly: You are not missing something obvious. std::priority_queue is not as useful as it should be. The best approach is to write your own heap implementation that supports updates . To make it fully STL compatible (especially allocator aware) is rather tricky and not a simple task. On top of that, implement the LFU cache.

直接回答你的问题:你没有错过一些明显的东西。priority_queue没有它应该的那么有用。最好的方法是编写支持更新的自己的堆实现。要使它完全兼容STL(特别是能够感知分配器)是相当困难的,而且不是一项简单的任务。除此之外,实现LFU缓存。

For the first step, look at the Boost implementations to get an idea of the effort. I'm not aware of any reference implementation for the second.

对于第一步,请查看Boost实现,以了解该工作的想法。我不知道关于第二个的任何参考实现。

To work around iterator invalidation you can always, choose indirection into another container, although you should try to avoid it as it creates an additional cost and can get quite messy.

要解决迭代器失效的问题,您始终可以选择将indirection转换为另一个容器,尽管您应该尽量避免它,因为它会产生额外的成本,并且可能会变得非常混乱。

#2


1  

A somewhat simpler approach than keeping two data structures:

一种比保留两个数据结构更简单的方法:

  • just keep a map, which maps your keys to their value/use-count pair.
  • 只需保存一个映射,它将键映射到它们的值/use-count对。
  • when the cache is full:
  • 当缓存满时:
    • create a vector of iterators to the map elements (O(n))
    • 创建映射元素的迭代器向量(O(n))
  • 创建映射元素的迭代器向量(O(n))
    • use std::nth_element to find the worst 10% (O(n))
    • 使用std::nth_element查找最差的10% (O(n))
  • 使用std::nth_element查找最差的10% (O(n))
    • remove them all from the map (O(n log n))
    • 将它们从地图上删除(O(n log n))
  • 将它们从地图上删除(O(n log n))

So, adding a new element to the cache is common case O(log n), worst case O(n log n), and amortized O(log n).

因此,向缓存中添加新元素是常见的O(log n)、最坏的O(n log n)和平摊的O(log n)。

Removing the worst 10% might be a bit drastic in a LFU cache, because new entries have to make the top 90% or they're cut. Then again, if you only remove one element then new entries still need to get off the bottom before the next new entry, or they're cut, and they have less time to do so. So depending why LFU is the right caching strategy for you, my change to it might be the wrong strategy, or it might still be fine.

删除最坏的10%在LFU缓存中可能有点极端,因为新的条目必须进入前90%,否则就会被删除。然后,如果您只删除了一个元素,那么新条目仍然需要在下一个新条目之前从底部删除,否则它们将被删除,而且它们这样做的时间更少。所以这取决于为什么LFU对您来说是正确的缓存策略,我对它的更改可能是错误的策略,或者它仍然是好的。