如果我的地图需要比快速更小,我应该使用哪种Map 实现?

时间:2022-09-10 23:14:50

I habitually use HashMap in my programs, since I know it is usually the most efficient (if properly used) and can cope with large maps easily. I know about EnumMap which is very useful for enumeration keys, but often I am generating a small map which will never get very big, is likely to be discarded pretty soon, and has no concurrency issues.

我习惯在我的程序中使用HashMap,因为我知道它通常是最有效的(如果使用得当)并且可以轻松应对大型地图。我知道EnumMap对于枚举键非常有用,但是我经常生成一个永远不会很大的小地图,很快就会被丢弃,并且没有并发问题。

Is HashMap<K,V> too complicated for these small, local and temporary uses? Is there another, simple, implementation which I can use in these cases?

HashMap 对于这些小型,本地和临时用途来说太复杂了吗?在这些情况下我是否可以使用另一种简单的实现方法? ,v>

I think I'm looking for a Map implementation which is analogous to ArrayList for List. Does it exist?

我想我正在寻找一个类似于ArrayList for List的Map实现。它存在吗?


Added later after responses:

在回复之后添加:

Here is a scenario where a slow but very simple implementation might be better -- when I have many, many of these Maps. Suppose, for example, I have a million or so of these tiny little maps, each with a handful (often less than three) of entries. I have a low reference rate -- perhaps I don't actually reference them before they are discarded most of the time. Is it still the case that HashMap is the best choice for them?

这是一个缓慢但非常简单的实现可能更好的场景 - 当我有很多很多这样的地图时。例如,假设我有一百万左右的这些微小的小地图,每个地图都有少量(通常少于三个)条目。我的参考率很低 - 或许在大部分时间丢弃之前我都没有引用它们。是否仍然是HashMap是他们的最佳选择?

Resource utilisation is more than just speed -- I would like something that doesn't fragment the heap a lot and make GCs take a long time, for example.

资源利用率不仅仅是速度 - 例如,我想要的东西不会破坏堆积很多并使GC需要很长时间。

It may be that HashMap is the right answer, but this is not a case of premature optimisation (or at least it may not be).

可能HashMap是正确的答案,但这不是过早优化的情况(或者至少它可能不是)。


Added much later after some thought:

经过一番思考后添加了很多:

I decided to hand-code my own SmallMap. It is easy to make one with AbstractMap. I have also added a couple of constructors so that a SmallMap can be constructed from an existing Map.

我决定手工编写自己的SmallMap代码。使用AbstractMap很容易制作一个。我还添加了几个构造函数,以便可以从现有Map构造SmallMap。

Along the way I had to decide how to represent Entrys and to implement SmallSet for the entrySet method.

在此过程中,我必须决定如何表示Entrys并为entrySet方法实现SmallSet。

I learned a lot by coding (and unit-testing this) and want to share this, in case anyone else wants one. It is on github here.

我通过编码(并对其进行单元测试)学到了很多东西,并想分享这个,以防其他人想要一个。它在github这里。

9 个解决方案

#1


17  

There is no standard small implementation of Map in Java. HashMap is one of the best and most flexible Map implementations around, and is hard to beat. However, in the very small requirement area -- where heap usage and speed of construction is paramount -- it is possible to do better.

Java中没有标准的Map实现。 HashMap是最好,最灵活的Map实现之一,很难被击败。但是,在非常小的需求区域 - 堆使用和构造速度至关重要 - 可以做得更好。

I have implemented SmallCollections on GitHub to demonstrate how this might be done. I would love some comments on whether I have succeeded. It is by no means certain that I have.

我已经在GitHub上实现了SmallCollections来演示如何做到这一点。我想对我是否成功有所评论。我无法确定。

Although the answers offered here were sometimes helpful, they tended, in general, to misunderstand the point. In any case, answering my own question was, in the end, much more useful to me than being given one.

虽然这里提供的答案有时是有帮助的,但总的来说,他们倾向于误解这一点。无论如何,回答我自己的问题,最终对我来说比给予一个更有用。

The question here has served its purpose, and that is why I have 'answered it myself'.

这里的问题已经达到了目的,这就是为什么我“自己回答”了。

#2


12  

I think this is premature optimization. Are you having memory problems? Performance problems from creating too many maps? If not I think HashMap is fine.

我认为这是不成熟的优化。你有记忆问题吗?创建太多地图会导致性能问题?如果不是,我认为HashMap没问题。

Besides, looking at the API, I'm not seeing anything simpler than a HashMap.

此外,看看API,我没有看到比HashMap更简单的东西。

If you are having issue, you could roll your own Map implementation, that has very simple internals. But I doubt you would do better than default Map implementations, plus you have the overhead of making sure your new class works. In this case there might be a problem with your design.

如果您遇到问题,可以推出自己的Map实现,它具有非常简单的内部功能。但是我怀疑你会比默认的Map实现做得更好,而且你有确保新类工作的开销。在这种情况下,您的设计可能存在问题。

#3


4  

A HashMap is possibly the most light weight and simple collection.

HashMap可能是最轻量级和简单的集合。

Sometimes the more efficient solution is to use a POJO. e.g. if your keys are field names and/or your values are primitives.

有时,更有效的解决方案是使用POJO。例如如果您的键是字段名称和/或您的值是原始值。

#4


2  

HashMap is a good choice because it offers average case O(1) puts and gets. It does not guarantee ordering though like SortedMap implementations (i.e. TreeMap O(log n) puts and gets) but if you have no requirement for ordering then HashMap is better.

HashMap是一个不错的选择,因为它提供了平均情况O(1)puts和gets。它不保证排序虽然像SortedMap实现(即TreeMap O(log n)put和gets)但是如果你不需要排序那么HashMap更好。

#5


1  

I agree with @hvgotcodes that it is premature optimization but it is still good to know all tools in the toolbox.

我同意@hvgotcodes认为它是过早的优化,但知道工具箱中的所有工具仍然是好的。

If you do a lot of iterations over what is in a map, a LinkedHashMap is usually quite a lot faster than a HashMap, if you have a lot of threads working with the Map at the same time, a ConcurrentHashMap is often a better choice. I wouldn't worry about any Map implementation being inefficient for small sets of data. It is typically the other way around, an incorrectly constructed map easily gets inefficient with large amounts of data if you have bad hash values or if something causes it to have too few buckets for its load.

如果你对map中的内容做了很多迭代,那么LinkedHashMap通常比HashMap快得多,如果你有很多线程同时使用Map,ConcurrentHashMap通常是更好的选择。我不担心任何Map实现对于小型数据集来说效率低下。通常情况下,如果你有糟糕的哈希值,或者某些东西导致它的负载太少,那么错误构造的地图很容易因大量数据而变得低效。

Then of course there are cases when a HashMap makes no sense at all, like if you have three values which you will always index with the keys 0, 1 and 2 but I assume you understand that :-)

然后当然有些情况下HashMap完全没有意义,比如你有三个值,你总是用键0,1和2索引,但我假设你理解:-)

#6


1  

HashMap uses more or less memory (when created) depending on how you initialize it: more buckets mean more memory usage, but faster access for large amounts of items; if you need only a small number of items you can initialize it with a small value, which will produce less buckets that will still be fast (since they will each receive a few items). There is no waste of memory if you set it correctly (the tradeoff is basically memory usage vs speed).

HashMap使用或多或少的内存(创建时)取决于您初始化它的方式:更多存储桶意味着更多的内存使用,但更快的访问大量项目;如果您只需要少量项目,则可以使用较小的值对其进行初始化,这样可以减少仍然很快的桶数(因为它们每个都会收到一些项目)。如果正确设置,则不会浪费内存(权衡基本上是内存使用与速度的关系)。

As for heap fragmentation and GC cycle wasting and whatnot, there is not much that a Map implementation can do about them; it all falls back to how you set it. Understand that this is not about Java's implementation, but the fact that generic (as in, for example, cannot assume anything about key values like EnumMap does) hashtables (not HashTables) are the best possible implementations of a map structure.

至于堆碎片和GC循环浪费等等,Map实现对它们的作用并不大;这一切都归结为你如何设置它。理解这不是关于Java的实现,而是泛型(例如,不能假设任何关于像EnumMap这样的关键值的东西)哈希表(不是HashTables)是地图结构的最佳实现。

#7


1  

Android has an ArrayMap with the intent of minimizing memory. In addition to being in the core, it's in the v4 support library, which, theoretically, should be able to compile for the Oracle or OpenJDK JREs as well. Here is a link to the source of ArrayMap in a fork of the v4 support library on github.

Android有一个ArrayMap,旨在最小化内存。除了在核心之外,它还在v4支持库中,理论上,它应该能够为Oracle或OpenJDK JRE编译。这是在github上的v4支持库的分支中指向ArrayMap源的链接。

#8


0  

There is an alternative called AirConcurrentMap that is more memory efficient above 1K Entries than any other Map I have found, and is faster than ConcurrentSkipListMap for key-based operations and faster than any Map for iterations, and has an internal thread pool for parallel scans. It is an ordered i.e. NavigableMap and a ConcurrentMap. It is free for non-commercial no-source use, and commercially licensed with or without source. See boilerbay.com for graphs. Full disclosure: I am the author.

有一个名为AirConcurrentMap的替代方案,它比1K条目更高的内存效率比我找到的任何其他Map更高,并且比基于键的操作的ConcurrentSkipListMap更快,并且比任何Map迭代更快,并且具有用于并行扫描的内部线程池。它是一个有序的,即NavigableMap和ConcurrentMap。它可以免费用于非商业性的无源使用,并且可以使用或不使用来源进行商业许可。有关图表,请参阅boilerbay.com。完全披露:我是作者。

AirConcurrentMap conforms to the standards so it is plug-compatible everywhere, even for a regular Map.

AirConcurrentMap符合标准,因此它在任何地方都是插件兼容的,即使对于常规Map也是如此。

Iterators are already very fast especially over 1K Entries. The higher-speed scans use a 'visitor' model with a single visit(k, v) callback that reaches the speed of Java 8 parallel streams. The AirConcurrentMap parallel scan exceeds Java 8 parallel streams by about 4x. The threaded visitor adds split() and merge() methods to the single-thread visitor that remind one of map/reduce:

迭代器已经非常快,尤其是超过1K的条目。高速扫描使用具有单次访问(k,v)回调的“访问者”模型,该回调达到Java 8并行流的速度。 AirConcurrentMap并行扫描超过Java 8并行流约4倍。线程访问者将split()和merge()方法添加到提醒map / reduce之一的单线程访问者:

static class ThreadedSummingVisitor<K> extends ThreadedMapVisitor<K, Long> {
    private long sum;
    // This is idiomatic
    long getSum(VisitableMap<K, Long> map) {
        sum = 0;
        map.getVisitable().visit(this);
        return sum;
    }

    @Override
    public void visit(Object k, Long v) {
        sum += ((Long)v).longValue();
    }

    @Override
    public ThreadedMapVisitor<K, Long> split() {
        return new ThreadedSummingVisitor<K>();
    }

    @Override
    public void merge(ThreadedMapVisitor<K, Long> visitor) {
        sum += ((ThreadedSummingVisitor<K>)visitor).sum;
    }
}
...
// The threaded summer can be re-used in one line now.
long sum = new ThreadedSummingVisitor().getSum((VisitableMap)map);

#9


0  

I also was interested and just for an experiment I created a map which stores keys and values just in fields and allows up to 5 entries. It consumes 4 less memory and works 16 times faster than HashMap https://github.com/stokito/jsmallmap

我也很感兴趣,只是为了实验,我创建了一个地图,它只在字段中存储键和值,最多允许5个条目。它消耗的内存减少了4个,比HashMap快了16倍https://github.com/stokito/jsmallmap

#1


17  

There is no standard small implementation of Map in Java. HashMap is one of the best and most flexible Map implementations around, and is hard to beat. However, in the very small requirement area -- where heap usage and speed of construction is paramount -- it is possible to do better.

Java中没有标准的Map实现。 HashMap是最好,最灵活的Map实现之一,很难被击败。但是,在非常小的需求区域 - 堆使用和构造速度至关重要 - 可以做得更好。

I have implemented SmallCollections on GitHub to demonstrate how this might be done. I would love some comments on whether I have succeeded. It is by no means certain that I have.

我已经在GitHub上实现了SmallCollections来演示如何做到这一点。我想对我是否成功有所评论。我无法确定。

Although the answers offered here were sometimes helpful, they tended, in general, to misunderstand the point. In any case, answering my own question was, in the end, much more useful to me than being given one.

虽然这里提供的答案有时是有帮助的,但总的来说,他们倾向于误解这一点。无论如何,回答我自己的问题,最终对我来说比给予一个更有用。

The question here has served its purpose, and that is why I have 'answered it myself'.

这里的问题已经达到了目的,这就是为什么我“自己回答”了。

#2


12  

I think this is premature optimization. Are you having memory problems? Performance problems from creating too many maps? If not I think HashMap is fine.

我认为这是不成熟的优化。你有记忆问题吗?创建太多地图会导致性能问题?如果不是,我认为HashMap没问题。

Besides, looking at the API, I'm not seeing anything simpler than a HashMap.

此外,看看API,我没有看到比HashMap更简单的东西。

If you are having issue, you could roll your own Map implementation, that has very simple internals. But I doubt you would do better than default Map implementations, plus you have the overhead of making sure your new class works. In this case there might be a problem with your design.

如果您遇到问题,可以推出自己的Map实现,它具有非常简单的内部功能。但是我怀疑你会比默认的Map实现做得更好,而且你有确保新类工作的开销。在这种情况下,您的设计可能存在问题。

#3


4  

A HashMap is possibly the most light weight and simple collection.

HashMap可能是最轻量级和简单的集合。

Sometimes the more efficient solution is to use a POJO. e.g. if your keys are field names and/or your values are primitives.

有时,更有效的解决方案是使用POJO。例如如果您的键是字段名称和/或您的值是原始值。

#4


2  

HashMap is a good choice because it offers average case O(1) puts and gets. It does not guarantee ordering though like SortedMap implementations (i.e. TreeMap O(log n) puts and gets) but if you have no requirement for ordering then HashMap is better.

HashMap是一个不错的选择,因为它提供了平均情况O(1)puts和gets。它不保证排序虽然像SortedMap实现(即TreeMap O(log n)put和gets)但是如果你不需要排序那么HashMap更好。

#5


1  

I agree with @hvgotcodes that it is premature optimization but it is still good to know all tools in the toolbox.

我同意@hvgotcodes认为它是过早的优化,但知道工具箱中的所有工具仍然是好的。

If you do a lot of iterations over what is in a map, a LinkedHashMap is usually quite a lot faster than a HashMap, if you have a lot of threads working with the Map at the same time, a ConcurrentHashMap is often a better choice. I wouldn't worry about any Map implementation being inefficient for small sets of data. It is typically the other way around, an incorrectly constructed map easily gets inefficient with large amounts of data if you have bad hash values or if something causes it to have too few buckets for its load.

如果你对map中的内容做了很多迭代,那么LinkedHashMap通常比HashMap快得多,如果你有很多线程同时使用Map,ConcurrentHashMap通常是更好的选择。我不担心任何Map实现对于小型数据集来说效率低下。通常情况下,如果你有糟糕的哈希值,或者某些东西导致它的负载太少,那么错误构造的地图很容易因大量数据而变得低效。

Then of course there are cases when a HashMap makes no sense at all, like if you have three values which you will always index with the keys 0, 1 and 2 but I assume you understand that :-)

然后当然有些情况下HashMap完全没有意义,比如你有三个值,你总是用键0,1和2索引,但我假设你理解:-)

#6


1  

HashMap uses more or less memory (when created) depending on how you initialize it: more buckets mean more memory usage, but faster access for large amounts of items; if you need only a small number of items you can initialize it with a small value, which will produce less buckets that will still be fast (since they will each receive a few items). There is no waste of memory if you set it correctly (the tradeoff is basically memory usage vs speed).

HashMap使用或多或少的内存(创建时)取决于您初始化它的方式:更多存储桶意味着更多的内存使用,但更快的访问大量项目;如果您只需要少量项目,则可以使用较小的值对其进行初始化,这样可以减少仍然很快的桶数(因为它们每个都会收到一些项目)。如果正确设置,则不会浪费内存(权衡基本上是内存使用与速度的关系)。

As for heap fragmentation and GC cycle wasting and whatnot, there is not much that a Map implementation can do about them; it all falls back to how you set it. Understand that this is not about Java's implementation, but the fact that generic (as in, for example, cannot assume anything about key values like EnumMap does) hashtables (not HashTables) are the best possible implementations of a map structure.

至于堆碎片和GC循环浪费等等,Map实现对它们的作用并不大;这一切都归结为你如何设置它。理解这不是关于Java的实现,而是泛型(例如,不能假设任何关于像EnumMap这样的关键值的东西)哈希表(不是HashTables)是地图结构的最佳实现。

#7


1  

Android has an ArrayMap with the intent of minimizing memory. In addition to being in the core, it's in the v4 support library, which, theoretically, should be able to compile for the Oracle or OpenJDK JREs as well. Here is a link to the source of ArrayMap in a fork of the v4 support library on github.

Android有一个ArrayMap,旨在最小化内存。除了在核心之外,它还在v4支持库中,理论上,它应该能够为Oracle或OpenJDK JRE编译。这是在github上的v4支持库的分支中指向ArrayMap源的链接。

#8


0  

There is an alternative called AirConcurrentMap that is more memory efficient above 1K Entries than any other Map I have found, and is faster than ConcurrentSkipListMap for key-based operations and faster than any Map for iterations, and has an internal thread pool for parallel scans. It is an ordered i.e. NavigableMap and a ConcurrentMap. It is free for non-commercial no-source use, and commercially licensed with or without source. See boilerbay.com for graphs. Full disclosure: I am the author.

有一个名为AirConcurrentMap的替代方案,它比1K条目更高的内存效率比我找到的任何其他Map更高,并且比基于键的操作的ConcurrentSkipListMap更快,并且比任何Map迭代更快,并且具有用于并行扫描的内部线程池。它是一个有序的,即NavigableMap和ConcurrentMap。它可以免费用于非商业性的无源使用,并且可以使用或不使用来源进行商业许可。有关图表,请参阅boilerbay.com。完全披露:我是作者。

AirConcurrentMap conforms to the standards so it is plug-compatible everywhere, even for a regular Map.

AirConcurrentMap符合标准,因此它在任何地方都是插件兼容的,即使对于常规Map也是如此。

Iterators are already very fast especially over 1K Entries. The higher-speed scans use a 'visitor' model with a single visit(k, v) callback that reaches the speed of Java 8 parallel streams. The AirConcurrentMap parallel scan exceeds Java 8 parallel streams by about 4x. The threaded visitor adds split() and merge() methods to the single-thread visitor that remind one of map/reduce:

迭代器已经非常快,尤其是超过1K的条目。高速扫描使用具有单次访问(k,v)回调的“访问者”模型,该回调达到Java 8并行流的速度。 AirConcurrentMap并行扫描超过Java 8并行流约4倍。线程访问者将split()和merge()方法添加到提醒map / reduce之一的单线程访问者:

static class ThreadedSummingVisitor<K> extends ThreadedMapVisitor<K, Long> {
    private long sum;
    // This is idiomatic
    long getSum(VisitableMap<K, Long> map) {
        sum = 0;
        map.getVisitable().visit(this);
        return sum;
    }

    @Override
    public void visit(Object k, Long v) {
        sum += ((Long)v).longValue();
    }

    @Override
    public ThreadedMapVisitor<K, Long> split() {
        return new ThreadedSummingVisitor<K>();
    }

    @Override
    public void merge(ThreadedMapVisitor<K, Long> visitor) {
        sum += ((ThreadedSummingVisitor<K>)visitor).sum;
    }
}
...
// The threaded summer can be re-used in one line now.
long sum = new ThreadedSummingVisitor().getSum((VisitableMap)map);

#9


0  

I also was interested and just for an experiment I created a map which stores keys and values just in fields and allows up to 5 entries. It consumes 4 less memory and works 16 times faster than HashMap https://github.com/stokito/jsmallmap

我也很感兴趣,只是为了实验,我创建了一个地图,它只在字段中存储键和值,最多允许5个条目。它消耗的内存减少了4个,比HashMap快了16倍https://github.com/stokito/jsmallmap