时间:2021-01-04 05:03:49


作者:Mikhail Vorontsov

IMPORTANT: Java 7 update 17 still has no changes in the subject of this article.
Sharing an underlying char[]
An original String implementation has 4 non static field: 
char[] value with string characters, 
int offset and int count with an index of the first character to use from value and a number of characters to use and 
int hash with a cached value of a String hash code. 
As you can see, in a very large number of cases a String will have offset = 0 and count = value.length. 
The only exception to this rule were the strings created via String.substring calls and all API calls using this method internally (like Pattern.split).
String.substring created a String, which shared an internal char[] value with an original String, which allowed you:
To save some memory by sharing character data
To run String.substring in a constant time ( O(1) )
At the same time such feature was a source of a possible memory leak: 
if you extract a tiny substring from an original huge string and discard that original string, 
you will still have a live reference to the underlying huge char[] value taken from an original String. 
The only way to avoid it was to call a new String( String ) constructor on such string – 
it made a copy of a required section of underlying char[], thus unlinking your shorter string from its longer “parent”.
From Java 1.7.0_06 (as well as in early versions of Java 8) offset and count fields were removed from a String. 
This means that you can’t share a part of an underlying char[] value anymore. 
Now you can forget about a memory leak described above and never ever use new String(String) constructor anymore. 
As a drawback, you now have to remember that String.substring has now a linear complexity instead of a constant one.
int offset(数组里面第一个有效字符的偏移量)+int count(有效字符的个数)
int hash:缓存了hash值
唯一避免这个问题的方式是在截取出来的字符串上调用new String( String ),这会对底层char[]的有效部分做拷贝,就把子串和原始的长串分离了。
现在你可以忘掉刚才描述的内存泄露这件事了,也不用再使用new String(String)这个构造函数了。

Changes to hashing logic
There is another change introduced to String class in the same update: a new hashing algorithm. 
Oracle suggests that a new algorithm gives a better distribution of hash codes, 
which should improve performance of several hash-based collections: HashMap, Hashtable, HashSet, LinkedHashMap, LinkedHashSet, WeakHashMap and ConcurrentHashMap.
Unlike changes from the first part of this article, these changes are experimental and turned off by default.
oracle说一个新的hash算法会提供更好的hashcode分布,这会改善许多基于hashcode的集合的性能,比如:HashMap, Hashtable, HashSet, LinkedHashMap, LinkedHashSet, WeakHashMap and ConcurrentHashMap。
As you may guess, these changes are only for String keys. If you want to turn them on, 
you’ll have to set a system property to a non-negative value (it is equal to -1 by default). 
This value will be a collection size threshold, after which a new hashing method will be used. 
A small remark here: hashing method will be changed on rehashing only (when there is no more free space). 
So, if a collection was rehashed last time at size = 160 and = 200, 
then a method will only be changed when your collection will grow to size of 320 (approximately).

String now has a hash32() method, which result is cached in int hash32 field. 
The biggest difference of this method is that the result of hash32() on the same string may be different on various JVM runs 
(actually, it will be different in most cases, because it uses a single System.currentTimeMillis() and two System.nanoTime calls for seed initialization). 
As a result, iteration order on some of your collections will be different each time you run your program.

Actually, I was a little surprised by this method. Why do we need it if an original hashCode method works very good? 
I decided to try a test program from hashCode method performance tuning article in order to find out 
how many duplicate hash codes we will have with a hash32 method.

String.hash32() method is not public, so I had to take a look at a HashMap implementation in order to find out how to call this method. 
The answer is sun.misc.Hashing.stringHash32(String).

The same test on 1 million distinct keys has shown 304 duplicate hash values, compared to no duplicates while using String.hashCode. 
I think, we need to wait for further improvements or use case descriptions from Oracle.


New hashing may severely affect highly multithreaded code
Oracle has made a bug in the implementation of hashing in the following classes: HashMap, Hashtable, HashSet, LinkedHashMap, LinkedHashSet and WeakHashMap. 
Only ConcurrentHashMap is not affected. The problem is that all non-concurrent classes now have the following field:
oracle对HashMap, Hashtable, HashSet, LinkedHashMap, LinkedHashSet and WeakHashMap这几个类的hash算法的实现有bug,只有ConcurrentHashMap没有bug。
 * A randomizing value associated with this instance that is applied to
 * hash code of keys to make hash collisions harder to find.
 * 和本实例相关联的一个随机数,用来计算key的hash值,它能减少hash碰撞。
transient final int hashSeed = sun.misc.Hashing.randomHashSeed(this);

This means that for every created map/set instance sun.misc.Hashing.randomHashSeed method will be called. randomHashSeed, 
in turn, calls java.util.Random.nextInt method. Random class is well known for its multithreaded unfriendliness: 
it contains private final AtomicLong seed field. Atomics behave well under low to medium contention, but work extremely bad under heavy contention.
Random这个类是多线程不友好的,因为它含有private final AtomicLong的字段作为种子。在竞争少的情况下Atomics工作的很好,但是,竞争大的情况下性能极差。

As a result, many highly loaded web applications processing HTTP/JSON/XML requests may be affected by this bug, 
because nearly all known parsers use one of the affected classes for “name-value” representation. 
All these format parsers may create nested maps, which further increases the number of maps created per second.

How to solve this problem?
1. ConcurrentHashMap way: call randomHashSeed method only when system property was defined. 
Unfortunately, it is available only for JDK core developers.
 * A randomizing value associated with this instance that is applied to
 * hash code of keys to make hash collisions harder to find.
private transient final int hashSeed = randomHashSeed(this);
private static int randomHashSeed(ConcurrentHashMap instance) {
    if (sun.misc.VM.isBooted() && Holder.ALTERNATIVE_HASHING) {
        return sun.misc.Hashing.randomHashSeed(instance);
    return 0;
2. Hacker way: fix sun.misc.Hashing class. Highly not recommended. If you still wish to go ahead, here is an idea: java.util.Random class is not final. 
You can extend it and override its nextInt method, returning something thread-local (a constant, for example). 
Then you will have to update sun.misc.Hashing.Holder.SEED_MAKER field – set it to your extended Random class instance. 
Don’t worry that this field is private, static and final – reflection can help you:
别担心这个字段是private static final的,用反射来搞定:
public class Hashing {
    private static class Holder {
        static final java.util.Random SEED_MAKER;
3. Buddhist way – do not upgrade to Java 7u6 and higher. Check new Java 7 update sources for this bug fix. 
Unfortunately, nothing has changed even in Java 7u17…
From Java 1.7.0_06 String.substring always creates a new underlying char[] value for every String it creates. 
This means that this method now has a linear complexity compared to previous constant complexity. 
The advantage of this change is a slightly smaller memory footprint of a String (4 bytes less than before) 
and a guarantee to avoid memory leaks caused by String.substring.
Starting from the same Java update, String class got a second hashing method called hash32. 
This method is currently not public and could be accessed without reflection only via sun.misc.Hashing.stringHash32(String) call. 
This method is used by 7 JDK hash-based collections if their size will exceed system property.
This is an experimental function and currently I don’t recommend using it in your code.
Starting from Java 1.7.0_06 all standard JDK non-concurrent maps and sets are affected by a performance bug caused by new hashing implementation. 
This bug affects only multithreaded applications creating heaps of maps per second. See this article for more details.
从Java 1.7.0_06开始,因为新的hash算法实现的bug,所有标准的jdk的非同步的map和set的性能都会受影响。这个bug只会影响每秒创建大量map的多线程的应用。
private final char value[];
private int hash;
public int hashCode() {
        int h = hash;
        if (h == 0 && value.length > 0) {
            char val[] = value;

for (int i = 0; i < value.length; i++) {
                h = 31 * h + val[i];
            hash = h;
        return h;
public int hashCode() {
int h = hash;
        int len = count;
if (h == 0 && len > 0) {
   int off = offset;
   char val[] = value;

for (int i = 0; i < len; i++) {
                h = 31*h + val[off++];
            hash = h;
        return h;