Wednesday 31 October 2012

Breaking String encapsulation using Unsafe

As we all know, String is immutable, which is great and it is also defensive about it's internals to maintain that immutability which is also great, but... sometimes you want to be able to get/set that damn internal char[] without copying it. While arguably this is not very important to most people, it is quite desirable on other occasions when you are trying to get the best performance from a given piece of code. Here's how to break the encapsulation using Unsafe:
1. Acquire Unsafe:
2. Get the field offsets for String fields:
3. Use the offsets to get/set the field values:
Now this seems a bit excessive doesn't it? Isn't encapsulation important? Why would you do that? The bottom line is that this is to scrape some extra performance juice out of your system, and if you are willing to get your hands dirty the above can give you a nice boost.
Getting the data out of String is far less questionable then altering it's internal final fields, just so we are clear. So it is not really recommended that you use the set functionality as illustrated above, unless you are sure it's going to work. Here is why it's generally a bad idea to write into final fields.
Using other techniques it should be possible to hack your way into the package private String constructor that would spare us that bit of hackiness to the same effect.
Measurements and a real world use case to follow...

Update(05/12/2012):
As per the source code used for the next post, the above source code would break for JDK7 as String no longer has the fields offset and count. The sentiment however stays the same and the final result is on GitHub.

Thursday 25 October 2012

Java Intrinsics are not JNI calls

That is a fact well known to those who know it well... but sadly some confusion still abounds, here's another go of explaining the differences.
An intrinsic is well defined on Wikipedia, to summarize: it is a function 'macro' to be handled by the compiler. The JIT compiler supports a large list of such intrinsic function macros . The beauty of the 2 concepts joined together(intrinsic functions and JIT compilation optimizations) is that the JIT compiler can optimize whole functions into single processor instructions, for the particular processor detected at runtime, and get a great performance boost.
Where it all gets a bit confusing is that the intrinsic functions show up as normal methods or native methods when browsing the source code. Regardless of what they look like (native/Java) they will magically be transformed to far more performant alternative when picked up by the JIT (Caution must be taken with this piece of advice as the JIT compiler behavior can change from JVM to JVM implementation and turn your crafty choice of functions from a speedy chariot to a soggy pumpkin... one such important JVM is the Android Dalvik)
To get an idea of the range of functions which benefit from this nifty trick you can check out this list on this Java gaming Wiki, or have a look in this header file where you can find a more definitive list(look for do_intrinsic) and also get a view on how these things hang together. Some classes/methods on the list:
  • The wonderful Unsafe. Almost all intrinsics.
  • Math: abs(double); sin(double); cos(double); tan(double); atan2(double, double); sqrt(double); log(double); log10(double); pow(double, double); exp(double); min(int, int); max(int, int); 
  • System: identityHashCode(Object); currentTimeMillis(); nanoTime(); arraycopy(....);
  • And many many more... 
See Mike Barker's detailed explanation here which is really much more in depth then mine. He did everything I meant to do when I set out to write this little post, so really you must check it out.
One take away from this is that looking through the code is not enough to reason about the performance. In some cases where the performance is surprisingly good you will find an intrinsic standing behind that little boost you didn't see coming.