优化java.util.Map/Set中的插入速度

时间:2023-01-07 19:38:20

is there a way to optimize the speed of the insertions in a java.util.Collection by specifying the order of the items ?

有没有办法通过指定项目的顺序来优化java.util.Collection中的插入速度?

For example

例如

java.util.Set<String> set = java.util.TreeSet<String>();

will this solution:

这个解决方案会:

set.add("A");
set.add("B");
set.add("C");
set.add("D");
set.add("E");

be faster than this one (random order) ?

比这更快(随机顺序)?

set.add("E");
set.add("D");
set.add("C");
set.add("A");
set.add("B");

(and the same question for the other collections: HashMap, hastable...)

(以及其他集合的相同问题:HashMap,hastable ......)

Thanks

谢谢

5 个解决方案

#1


9  

The easy answer is "time it and see".

简单的答案是“时间和看到”。

The other answer is "it won't matter". This seems to be a micro-optimization that is hardly worth the effort. I think it falls into the category of "The Sad Tragedy of Micro-Optimization Theater".

另一个答案是“无所谓”。这似乎是微观优化,几乎不值得努力。我认为它属于“微观优化剧场悲剧悲剧”的范畴。

#2


6  

No for java.util.Map and java.util.Set, because these are interfaces, and there are different implementations.

对于java.util.Map和java.util.Set没有,因为这些是接口,并且有不同的实现。

For concrete implementations it is not a worthwhile optimization. If you have problems with performance chose a better suited implementation, or rethink what and how you need to store.

对于具体实现,它不是一个值得优化的。如果您遇到性能问题,请选择更合适的实施方案,或重新考虑您需要存储的内容和方式。

Inserting 5000 random numbers into a HashSet takes about a millisecond on a run-of-the-mill laptop, so how many millions of elements do you want to insert to make this kind of optimization worthwhile?

在一台普通的笔记本电脑上插入5000个随机数到一个HashSet大约需要一毫秒,所以你想要插入多少百万个元素才能使这种优化变得有价值?

#3


3  

Insertion time for a red-black tree (which is used to implement Java's TreeSet/TreeMap) is guaranteed worst case to be O(log n). It could be faster if the items are in a particular order, but I'm unsure what that would be (probably pre-sorted would be fastest?).

红黑树的插入时间(用于实现Java的TreeSet / TreeMap)保证最坏情况为O(log n)。如果项目按特定顺序排列可能会更快,但我不确定它会是什么(可能预分类会最快?)。

Insertion into a hashtable is a O(1) (constant time) operation. The main thing done for insertion is calculation of the hashcode.

插入哈希表是O(1)(恒定时间)操作。插入的主要内容是计算哈希码。


Edit: Starblue suggests pre-sorted may yield the worst-case performance so you could try randomized order.

编辑:Starblue建议预先排序可能会产生最坏情况的表现,因此您可以尝试随机顺序。

#4


2  

There is naturally a huge difference between hash-based collections and tree-based ones.

基于散列的集合和基于树的集合之间自然存在巨大差异。

Tree based ones benefit from element ordering for insertion (e.g., comparisons between strings), so when you have comparable objects (like string) it is better to use them. The TreeSet/TreeMap/etc. in the standard collection is supposed to be balanced (red-black tree) so insertion order doesn't matter that much. If it was not balanced, then insertion order would matter since you could end up with a chain instead of a tree.

基于树的元素受益于插入的元素排序(例如,字符串之间的比较),因此当您具有可比较的对象(如字符串)时,最好使用它们。 TreeSet / TreeMap /等。在标准集合中应该是平衡的(红黑树),所以插入顺序并不重要。如果它不平衡,那么插入顺序就很重要,因为你可能最终得到的是链而不是树。

In hash tables, the loading factor and hashing function decide everything, but if you're dealing with strings, you may be better of not even bothering with hashing.

在哈希表中,加载因子和散列函数决定了所有内容,但是如果你正在处理字符串,你可能更好的是甚至没有使用散列。

If you need a set of strings for many strings with overlaps, a Trie may be more memory efficient, but I don't think that there is one in the library.

如果你需要一组带有重叠字符串的字符串,Trie可能会更高效,但我不认为库中有一个字符串。

#5


1  

Be careful to consider the characteristics of your data structure when taking optimization measures. For one extreme example, inserting elements into a binary tree in sorted order would result in a linked list.

在采取优化措施时,请注意考虑数据结构的特征。对于一个极端的示例,按排序顺序将元素插入二叉树将导致链接列表。

#1


9  

The easy answer is "time it and see".

简单的答案是“时间和看到”。

The other answer is "it won't matter". This seems to be a micro-optimization that is hardly worth the effort. I think it falls into the category of "The Sad Tragedy of Micro-Optimization Theater".

另一个答案是“无所谓”。这似乎是微观优化,几乎不值得努力。我认为它属于“微观优化剧场悲剧悲剧”的范畴。

#2


6  

No for java.util.Map and java.util.Set, because these are interfaces, and there are different implementations.

对于java.util.Map和java.util.Set没有,因为这些是接口,并且有不同的实现。

For concrete implementations it is not a worthwhile optimization. If you have problems with performance chose a better suited implementation, or rethink what and how you need to store.

对于具体实现,它不是一个值得优化的。如果您遇到性能问题,请选择更合适的实施方案,或重新考虑您需要存储的内容和方式。

Inserting 5000 random numbers into a HashSet takes about a millisecond on a run-of-the-mill laptop, so how many millions of elements do you want to insert to make this kind of optimization worthwhile?

在一台普通的笔记本电脑上插入5000个随机数到一个HashSet大约需要一毫秒,所以你想要插入多少百万个元素才能使这种优化变得有价值?

#3


3  

Insertion time for a red-black tree (which is used to implement Java's TreeSet/TreeMap) is guaranteed worst case to be O(log n). It could be faster if the items are in a particular order, but I'm unsure what that would be (probably pre-sorted would be fastest?).

红黑树的插入时间(用于实现Java的TreeSet / TreeMap)保证最坏情况为O(log n)。如果项目按特定顺序排列可能会更快,但我不确定它会是什么(可能预分类会最快?)。

Insertion into a hashtable is a O(1) (constant time) operation. The main thing done for insertion is calculation of the hashcode.

插入哈希表是O(1)(恒定时间)操作。插入的主要内容是计算哈希码。


Edit: Starblue suggests pre-sorted may yield the worst-case performance so you could try randomized order.

编辑:Starblue建议预先排序可能会产生最坏情况的表现,因此您可以尝试随机顺序。

#4


2  

There is naturally a huge difference between hash-based collections and tree-based ones.

基于散列的集合和基于树的集合之间自然存在巨大差异。

Tree based ones benefit from element ordering for insertion (e.g., comparisons between strings), so when you have comparable objects (like string) it is better to use them. The TreeSet/TreeMap/etc. in the standard collection is supposed to be balanced (red-black tree) so insertion order doesn't matter that much. If it was not balanced, then insertion order would matter since you could end up with a chain instead of a tree.

基于树的元素受益于插入的元素排序(例如,字符串之间的比较),因此当您具有可比较的对象(如字符串)时,最好使用它们。 TreeSet / TreeMap /等。在标准集合中应该是平衡的(红黑树),所以插入顺序并不重要。如果它不平衡,那么插入顺序就很重要,因为你可能最终得到的是链而不是树。

In hash tables, the loading factor and hashing function decide everything, but if you're dealing with strings, you may be better of not even bothering with hashing.

在哈希表中,加载因子和散列函数决定了所有内容,但是如果你正在处理字符串,你可能更好的是甚至没有使用散列。

If you need a set of strings for many strings with overlaps, a Trie may be more memory efficient, but I don't think that there is one in the library.

如果你需要一组带有重叠字符串的字符串,Trie可能会更高效,但我不认为库中有一个字符串。

#5


1  

Be careful to consider the characteristics of your data structure when taking optimization measures. For one extreme example, inserting elements into a binary tree in sorted order would result in a linked list.

在采取优化措施时,请注意考虑数据结构的特征。对于一个极端的示例,按排序顺序将元素插入二叉树将导致链接列表。