如何实现最不常用的(LFU)缓存?

时间:2022-01-07 05:41:23

Least Frequently Used (LFU) is a type of cache algorithm used to manage memory within a computer. The standard characteristics of this method involve the system keeping track of the number of times a block is referenced in memory. When the cache is full and requires more room the system will purge the item with the lowest reference frequency.

最不常用的(LFU)是一种用于管理计算机内存的缓存算法。该方法的标准特性涉及系统跟踪一个块在内存中被引用的次数。当缓存满了并且需要更多的空间时,系统将以最低的参考频率清除条目。

What would be the best way to implement a most-recently-used cache of objects, say in Java?

实现最近使用的对象缓存(比如Java)的最佳方法是什么?

I've already implemented one using LinkedHashMap(by maintaining the no. of times objects are accessed) But I'm curious if any of the new concurrent collections would be better candidates.

我已经使用LinkedHashMap(通过保持no)实现了一个。但我很好奇,是否有新的并发集合会是更好的候选对象。

Consider this case : Suppose cache is full and we need to make space for another one. Say two objects are noted in cache which are accessed for one time only. Which one to remove if we come to know that other(which is not in cache)object is being accessed for more than once ?

考虑这种情况:假设缓存已满,我们需要为另一个缓存腾出空间。假设缓存中记录了两个对象,它们只被访问一次。如果我们知道其他(不在缓存中的)对象正在被访问不止一次,那么该删除哪一个呢?

Thanks!

谢谢!

4 个解决方案

#1


0  

  1. According to me, the best way to implement a most-recently-used cache of objects would be to include a new variable as 'latestTS' for each object. TS stands for timestamp.

    在我看来,实现最近使用的对象缓存的最佳方法是为每个对象包含一个新变量,作为“最新的”变量。TS代表时间戳。

    // A static method that returns the current date and time as milliseconds since January 1st 1970 long latestTS = System.currentTimeMillis();

    //从1970年1月1日开始返回当前日期和时间作为毫秒的静态方法,long latestTS = System.currentTimeMillis();

  2. ConcurrentLinkedHashMap is not yet implemented in Concurrent Java Collections. (Ref: Java Concurrent Collection API). However, you can try and use ConcurrentHashMap and DoublyLinkedList

    ConcurrentLinkedHashMap还没有在并发Java集合中实现。(Ref: Java Concurrent Collection API)。但是,您可以尝试使用ConcurrentHashMap和DoublyLinkedList

  3. About the case to be considered: in such case, as I have said that you can declare latestTS variable, based upon the value of latestTS variable, you can remove an entry and add the new object. (Don't forget to update frequency and latestTS of the new object added)

    关于要考虑的情况:在这种情况下,如我所说,您可以根据latestTS变量的值声明latestTS变量,您可以删除一个条目并添加新对象。(不要忘记更新新添加对象的频率和最近时间)

As you have mentioned, you can use LinkedHashMap as it gives element access in O(1) and also, you get the order traversal. Please, find the below code for LFU Cache: (PS: The below code is the answer for the question in the title i.e. "How to implement LFU cache")

正如您所提到的,您可以使用LinkedHashMap,因为它在O(1)中提供了元素访问,而且还可以进行顺序遍历。请找到以下LFU缓存的代码:(PS:下面的代码是标题中问题的答案。“如何实施LFU高速缓存”)

import java.util.LinkedHashMap;
import java.util.Map;

public class LFUCache {

    class CacheEntry
    {
        private String data;
        private int frequency;

        // default constructor
        private CacheEntry()
        {}

        public String getData() {
            return data;
        }
        public void setData(String data) {
            this.data = data;
        }

        public int getFrequency() {
            return frequency;
        }
        public void setFrequency(int frequency) {
            this.frequency = frequency;
        }       

    }

    private static int initialCapacity = 10;

    private static LinkedHashMap<Integer, CacheEntry> cacheMap = new LinkedHashMap<Integer, CacheEntry>();
    /* LinkedHashMap is used because it has features of both HashMap and LinkedList. 
     * Thus, we can get an entry in O(1) and also, we can iterate over it easily.
     * */

    public LFUCache(int initialCapacity)
    {
        this.initialCapacity = initialCapacity;
    }

    public void addCacheEntry(int key, String data)
    {
        if(!isFull())
        {
            CacheEntry temp = new CacheEntry();
            temp.setData(data);
            temp.setFrequency(0);

            cacheMap.put(key, temp);
        }
        else
        {
            int entryKeyToBeRemoved = getLFUKey();
            cacheMap.remove(entryKeyToBeRemoved);

            CacheEntry temp = new CacheEntry();
            temp.setData(data);
            temp.setFrequency(0);

            cacheMap.put(key, temp);
        }
    }

    public int getLFUKey()
    {
        int key = 0;
        int minFreq = Integer.MAX_VALUE;

        for(Map.Entry<Integer, CacheEntry> entry : cacheMap.entrySet())
        {
            if(minFreq > entry.getValue().frequency)
            {
                key = entry.getKey();
                minFreq = entry.getValue().frequency;
            }           
        }

        return key;
    }

    public String getCacheEntry(int key)
    {
        if(cacheMap.containsKey(key))  // cache hit
        {
            CacheEntry temp = cacheMap.get(key);
            temp.frequency++;
            cacheMap.put(key, temp);
            return temp.data;
        }
        return null; // cache miss
    }

    public static boolean isFull()
    {
        if(cacheMap.size() == initialCapacity)
            return true;

        return false;
    }
}

#2


5  

You might benefit from the LFU implementation of ActiveMQ: LFUCache

您可能会从ActiveMQ: LFUCache的LFU实现中获益

They have provided some good functionality.

它们提供了一些好的功能。

#3


3  

I think, the LFU data structure must combine priority queue (for maintaining fast access to lfu item) and hash map (for providing fast access to any item by its key); I would suggest the following node definition for each object stored in cache:

我认为,LFU数据结构必须将优先队列(用于保持对LFU项的快速访问)和散列映射(用于以其密钥对任何项的快速访问)组合起来;对于存储在缓存中的每个对象,我建议如下的节点定义:

class Node<T> {
   // access key
   private int key;
   // counter of accesses
   private int numAccesses;
   // current position in pq
   private int currentPos;
   // item itself
   private T item;
   //getters, setters, constructors go here
}

You need key for referring to an item. You need numAccesses as a key for priority queue. You need currentPos to be able to quickly find a pq position of item by key. Now you organize hash map (key(Integer) -> node(Node<T>)) to quickly access items and min heap-based priority queue using number of accesses as priority. Now you can very quickly perform all operations (access, add new item, update number of acceses, remove lfu). You need to write each operation carefully, so that it maintains all the nodes consistent (their number of accesses, their position in pq and there existence in hash map). All operations will work with constant average time complexity which is what you expect from cache.

你需要一个指向一个项目的键。您需要将numaccess作为优先队列的键。您需要currentPos能够快速找到一个项目的pq位置的关键。现在您可以组织散列映射(key(Integer) ->节点(node )),以使用访问数作为优先级快速访问项和基于堆的优先级队列。现在您可以非常快速地执行所有操作(访问、添加新项、更新acceses数量、删除lfu)。您需要仔细地编写每个操作,以便它维护所有的节点一致(它们的访问数量、在pq中的位置以及在散列映射中存在的位置)。所有操作都将使用恒定的平均时间复杂度,这是您希望从缓存中得到的。

#4


0  

How about a priority queue? You can keep elements sorted there with keys representing the frequency. Just update the object position in the queue after visiting it. You can update just from time to time for optimizing the performance (but reducing precision).

优先队列呢?你可以用表示频率的键来排序元素。在访问队列后,只需更新队列中的对象位置。您可以不时地更新以优化性能(但降低精度)。

#1


0  

  1. According to me, the best way to implement a most-recently-used cache of objects would be to include a new variable as 'latestTS' for each object. TS stands for timestamp.

    在我看来,实现最近使用的对象缓存的最佳方法是为每个对象包含一个新变量,作为“最新的”变量。TS代表时间戳。

    // A static method that returns the current date and time as milliseconds since January 1st 1970 long latestTS = System.currentTimeMillis();

    //从1970年1月1日开始返回当前日期和时间作为毫秒的静态方法,long latestTS = System.currentTimeMillis();

  2. ConcurrentLinkedHashMap is not yet implemented in Concurrent Java Collections. (Ref: Java Concurrent Collection API). However, you can try and use ConcurrentHashMap and DoublyLinkedList

    ConcurrentLinkedHashMap还没有在并发Java集合中实现。(Ref: Java Concurrent Collection API)。但是,您可以尝试使用ConcurrentHashMap和DoublyLinkedList

  3. About the case to be considered: in such case, as I have said that you can declare latestTS variable, based upon the value of latestTS variable, you can remove an entry and add the new object. (Don't forget to update frequency and latestTS of the new object added)

    关于要考虑的情况:在这种情况下,如我所说,您可以根据latestTS变量的值声明latestTS变量,您可以删除一个条目并添加新对象。(不要忘记更新新添加对象的频率和最近时间)

As you have mentioned, you can use LinkedHashMap as it gives element access in O(1) and also, you get the order traversal. Please, find the below code for LFU Cache: (PS: The below code is the answer for the question in the title i.e. "How to implement LFU cache")

正如您所提到的,您可以使用LinkedHashMap,因为它在O(1)中提供了元素访问,而且还可以进行顺序遍历。请找到以下LFU缓存的代码:(PS:下面的代码是标题中问题的答案。“如何实施LFU高速缓存”)

import java.util.LinkedHashMap;
import java.util.Map;

public class LFUCache {

    class CacheEntry
    {
        private String data;
        private int frequency;

        // default constructor
        private CacheEntry()
        {}

        public String getData() {
            return data;
        }
        public void setData(String data) {
            this.data = data;
        }

        public int getFrequency() {
            return frequency;
        }
        public void setFrequency(int frequency) {
            this.frequency = frequency;
        }       

    }

    private static int initialCapacity = 10;

    private static LinkedHashMap<Integer, CacheEntry> cacheMap = new LinkedHashMap<Integer, CacheEntry>();
    /* LinkedHashMap is used because it has features of both HashMap and LinkedList. 
     * Thus, we can get an entry in O(1) and also, we can iterate over it easily.
     * */

    public LFUCache(int initialCapacity)
    {
        this.initialCapacity = initialCapacity;
    }

    public void addCacheEntry(int key, String data)
    {
        if(!isFull())
        {
            CacheEntry temp = new CacheEntry();
            temp.setData(data);
            temp.setFrequency(0);

            cacheMap.put(key, temp);
        }
        else
        {
            int entryKeyToBeRemoved = getLFUKey();
            cacheMap.remove(entryKeyToBeRemoved);

            CacheEntry temp = new CacheEntry();
            temp.setData(data);
            temp.setFrequency(0);

            cacheMap.put(key, temp);
        }
    }

    public int getLFUKey()
    {
        int key = 0;
        int minFreq = Integer.MAX_VALUE;

        for(Map.Entry<Integer, CacheEntry> entry : cacheMap.entrySet())
        {
            if(minFreq > entry.getValue().frequency)
            {
                key = entry.getKey();
                minFreq = entry.getValue().frequency;
            }           
        }

        return key;
    }

    public String getCacheEntry(int key)
    {
        if(cacheMap.containsKey(key))  // cache hit
        {
            CacheEntry temp = cacheMap.get(key);
            temp.frequency++;
            cacheMap.put(key, temp);
            return temp.data;
        }
        return null; // cache miss
    }

    public static boolean isFull()
    {
        if(cacheMap.size() == initialCapacity)
            return true;

        return false;
    }
}

#2


5  

You might benefit from the LFU implementation of ActiveMQ: LFUCache

您可能会从ActiveMQ: LFUCache的LFU实现中获益

They have provided some good functionality.

它们提供了一些好的功能。

#3


3  

I think, the LFU data structure must combine priority queue (for maintaining fast access to lfu item) and hash map (for providing fast access to any item by its key); I would suggest the following node definition for each object stored in cache:

我认为,LFU数据结构必须将优先队列(用于保持对LFU项的快速访问)和散列映射(用于以其密钥对任何项的快速访问)组合起来;对于存储在缓存中的每个对象,我建议如下的节点定义:

class Node<T> {
   // access key
   private int key;
   // counter of accesses
   private int numAccesses;
   // current position in pq
   private int currentPos;
   // item itself
   private T item;
   //getters, setters, constructors go here
}

You need key for referring to an item. You need numAccesses as a key for priority queue. You need currentPos to be able to quickly find a pq position of item by key. Now you organize hash map (key(Integer) -> node(Node<T>)) to quickly access items and min heap-based priority queue using number of accesses as priority. Now you can very quickly perform all operations (access, add new item, update number of acceses, remove lfu). You need to write each operation carefully, so that it maintains all the nodes consistent (their number of accesses, their position in pq and there existence in hash map). All operations will work with constant average time complexity which is what you expect from cache.

你需要一个指向一个项目的键。您需要将numaccess作为优先队列的键。您需要currentPos能够快速找到一个项目的pq位置的关键。现在您可以组织散列映射(key(Integer) ->节点(node )),以使用访问数作为优先级快速访问项和基于堆的优先级队列。现在您可以非常快速地执行所有操作(访问、添加新项、更新acceses数量、删除lfu)。您需要仔细地编写每个操作,以便它维护所有的节点一致(它们的访问数量、在pq中的位置以及在散列映射中存在的位置)。所有操作都将使用恒定的平均时间复杂度,这是您希望从缓存中得到的。

#4


0  

How about a priority queue? You can keep elements sorted there with keys representing the frequency. Just update the object position in the queue after visiting it. You can update just from time to time for optimizing the performance (but reducing precision).

优先队列呢?你可以用表示频率的键来排序元素。在访问队列后,只需更新队列中的对象位置。您可以不时地更新以优化性能(但降低精度)。