Παρασκευή 27 Μαΐου 2011

String.hashCode()'s bug

Do you know that it is possible to get the same hashCode for two different strings, although they might not be that long?

Try the following piece of code:

int hashCodeER4="ER4".hashCode();
int hashCodeEQS="EQS".hashCode();
System.out.println("ER4's hashCode()="+hashCodeER4);
System.out.println("EQS's hashCode()="+hashCodeEQS);

What do you get?

ER4's hashCode()=68903
EQS's hashCode()=68903

What does it mean?

There are several pieces of code that "compare" Object's using their hashCode.

For instance,
HashMap uses the hashCode of the key in order to retrieve the corresponding value[1]

Why does it happen?

According to the javadoc:
"The hash code for a String object is computed as
 s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1] 
using int arithmetic, where s[i] is the ith character of the string, n is the length of the string, and ^ indicates exponentiation. (The hash value of the empty string is zero.)"

Which means that each character's value should be between 0 and 30 in order to avoid conflicts, but in fact even in case of ASCII we have a range from 0 up to 255, while in case of UTF-8 or even worse UT-16 the range is much larger.


TODO: Calculate manually the hashCode to check whether it is a problem of the formula or of the implementation of String.hashCode();

Notes:
[1] The following code is used in the HashMap in order to retrieve the value that is set for the requested key. As you can see we search using the hashCode, not the actual object itself.
public V get(Object key) {
if (key == null)
return getForNullKey();
int hash = hash(key.hashCode());
for (Entry e = table[indexFor(hash, table.length)];
e != null;
e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
return e.value;
}
return null;
}


LinkWithin

Blog Widget by LinkWithin

Mobile edition