Archive for the ‘Java’ Category

Unsigned in Java

Tuesday, October 26th, 2010

Hello everyone,

as you may know we (ReviveR team) chose to use Java as the main language for the framework and maybe the UI too (they are totally separated for now).
We started by converting diStorm to Java using JNI, and converting diSlib64, my robust PE file parser in Python.

While we were doing the conversion we found out that there is no unsigned keyword in Java. Yes, I gotta admit, we are noobs in Java, but we are professional coders using other languages, as a matter of fact. On the other side, everyone knows it’s all about new syntax and the benefits of the language itself that once you’re used to, then you rock with the language. So syntax is easy. And it’s gonna take a short while ’till we learn the benefits of Java. After all we chose Java because it is widely cross platform, and C# is ten times better, I can just claim it, not going to prove it. In this post, however, I’m going to talk about the disadvantages of Java, to name one, unsigned numbers.

When you parse a PE+ file (that’s for AMD64), you need to read some 64 bit integers. Therefore we needed a way to hold an address in 64 bits, usually addresses are unsigned, in contrast to RVAs. The problem was that there is no way to define an unsigned long in Java. This is a really unpleasant welcome to Java, seriously. Wtf did the designer think? And I looked for his stupid comment, it read something like: “ahh, most coders don’t need ‘unsigned’, it only complicates stuff”. What a douche. Now, this is a denial to reality. Looking for other alternatives on the net I found that most people use a bigger size for their integer, suppose they need an unsigned 8 bits integer, then they will use the next bigger size that could hold such an integer as unsigned, which is short… This is so lame, you can even get unsigned 32 bit integers, by using longs, right? But what about using unsigned 64 bit integers? No bigger size, no way.
Others say, you can use BigIntegers, the moment I heard about that I wanted to cry out loud. My guess is that the implementation is a bit vector. So using BigInteger only for representing unsigned longs, that’s useless, if you ask me. Oh and I almost forgot to mention that it accepts the byte[] in big endian only, blee.

I really got pissed off, there were moments I wanted to go back to C++. Although I knew that I’m going to waste time on auto-pointers, data structures, and shit like that, but C++ has unsigned. How cool is that.

I consulted with a friend and he referred me to this link: Unsigned arithmetic in Java.
That seemed a bit helpful, and I liked the general idea. I think there are errors in the code snippets (didn’t check them though). Anyway, the guy suggests to use an “isLessThanUnsigned” comparison, I didn’t want to limit my unsigned long’s interface in such a way.

Therefore I took a look at the interface of BigInteger, saw that they use a compareTo method, and did the same on a new class I wrote, named ULong. The class can accept, byte array, bytebuffer, longs, and also as big endian if necessary.

The compareTo was written from scratch:

public long compareTo(ULong rh)
{
	// If both numbers have the same sign, it's up to their real values.
	if (((mValue ^ rh.mValue) >> 63) == 0) return mValue - rh.mValue;
	// Here they have different signs, if mValue has the MSB set, it's negative _in Java_, thus bigger.
	if (mValue < 0) return 1;
	// Else, the rh.mValue is bigger.
	return -1;
}

Very basic arithmetic operations, and it's pretty quick relative to BigInteger's, mine is 8.5 times faster, on my machine...
The point is that I couldn't accept all the extra stuff it needed to do in order to represent an unsigned long. It bugged me. I'm not going to stop and take my time again (hopefully) on issues like this, but since I don't know Java this well, I was curious to see how things work.

Another issue that I didn't like is that you cannot define global functions (or am I wrong here?), everything has to be in classes, this is annoying sometimes, but I guess the rational was to force a kind of 'namespaces', so it's fine eventually - but let me decide what to do, I know what I'm doing.

Last one, the separation to files based on public classes, it really forces one to divide all his classes into lots of files. Or dump them one after the other as inner classes. And then if you have a third inner enum, for instance, the compiler shouts at you that the outer class has to be static, etc. Consequently, it forces you to move it out, and then you find yourself dividing your code again, and now it's out of context of the class you wanted to put it in...

Oh dear Java, a love begins :(

P.S - I think that the beauty is that I know to use high level languages when I have to, with all due respect to me and low level.