diStorm3 is ready for the masses! ![]()
- if you want to maximize the information you get from a single instruction; Structure output rather than text, flow control analysis support and more!
Check it out now at its new google page.
Good luck!
diStorm3 is ready for the masses! ![]()
- if you want to maximize the information you get from a single instruction; Structure output rather than text, flow control analysis support and more!
Check it out now at its new google page.
Good luck!
Hey people
I just uploaded the source code of diStorm3 to Google Code.
Now you can enjoy SVN too! And of course, better source code and features from the disassembler itself.
It is officially released under the GPLv3.
Special thanks to Michael Rolle, for tons of suggestions, ideas and fixes. Also thanks to many others who reported buggy instructions.
In the next few weeks I will update the source code some more, for more compilers, etc. Going to re-do the webpage of diStorm. Hopefully will have some nice logo too. And the most important thing, I will make a somehwhat tutorial of how-to use the new disassembler with the newest features. Stay tuned!
Gil
Hello there,
if you wanna know me better and you use diStorm, take 10 minutes from your precious time and please read my post.
If you don’t give a damn about diStorm or open source community, you will waste your time reading this post, so visit here.
Believe me or not, I would have given this code (new version) under BSD. I really wanted to. A proof is that diStorm64 was BSD so far, and still is, however it is deprecated. But I don’t see a good end for BSD. Let me explain. It’s very permissive, everybody can use it and I like it more than GPL for that reason, truely. Why after all I decided on releasing diStorm in a GPL license, and not even LGPL? That’s a good question and I will answer this by telling you my story and more.
I began this project for pure fun and challenge of decoding x86, back in mid 2003 already. Soon enough I got some basic framework that could accept a stream, look in some data structure and fetch an instruction, then decode the operands. Sounds easy, right? Because it is easy. But that was only for integer instructions. After a few months I added FPU support. This is where I started to hate the x86 machine code, really. It’s a pain in the arse. Only at that time I started to look around and to see what other (disassembler) libraries we got on the Internet, it was already 2004, and I still enjoyed coding diStorm for the fun and sport of it, 100% from scratch, always. I knew from day one that I am going to make it an open source project. And yet, it wasn’t so useful, there were better disassemblers out there. I’m talking about binary stream diassemblers, mind you, not the GUI wrapped ones, with high level features. Anyway, diStorm wasn’t any special yet. Therefore I had to work hard, in my free time, no complaints ever. And added all those SSE instructions set. Like 5 sets eventually were added to diStorm, in addition to new sets every now and then in other computing fields. Honestly, I doubt people use diStorm for SSE, but you never know. Besides the goal of diStorm was to be a complete product, top quality and optimized, and I achieved them all within time. diStorm was opened source in the beginning of 2006. A few months later I added AMD64 support, and then diStorm was the first open source disassembler library to support it.
All the while I got lots of emails about diStorm. Some were about asking help of how to use it, some were about defects in specific isntructions, etc. And even two critical bugs, one is code regression that I put a bug in the code accidentally, and the other was some memory leak in the Python module, which I happened to fix before already.
The most appreciated work from the community was about the sample projects. People helped me with the code to make it more useful, and better code for each platform (there are Win32 and Linux separated projects). But never anything about diStorm’s code itself. Maybe the project is too complex. Maybe there was no need, overall it was stable and mature. I don’t know.
Since 2005 I got more than 50,000 downloads of the variants of diStorm (the sample projects, the full source code, the library, the Python modules, etc). It is a lot, like 10K a year. Don’t forget that after all it’s a mere disassembler, not some crazy application, and it’s even a library, so only developers can use it. Though later I added the flat disassembler project compiled and which can be downloaded in a binary form. And what we learn from this? Nothing, nada! You can never trust statistics as this, and it doesn’t mean much. Cause there are new releases and the same person can download the project a few times within some time, to get the latest version, etc. So I can’t have any info about how many people/companies use it around the world.
My own goal was to make diStorm the best disassembler out there. Only you can judge. I know what it’s worth.
Sometimes I had the doubts about this issue. And it gave me the inspiration to go on and bring the next generation disassembler library to the world, one that afterwards I can retire, in a way, and finally start to enjoy the fruits of my hard work with coding code analysis tools myself in the future. About time, yey, no more excuses, no more stupid string parsing, totally efficient.
I see many people complain about the general status of tools in the Reverse Engineering field. Not many people open their code. And the rest make money out of it. And I want to make a better community in this field. I really tried to do so, but without success. People that use diStorm either only use it as is, and they won’t open their code, because they want to keep their precious knowledge to themselves, or sell it for money. That’s legitimate. You work, you get money. And it also gives the opportunity for tools to exist, otherwise they won’t be there at all. But I think we can do better. I am doing better myself this way, I belive so and therefore I open the code in GPL. GPL is ugly and the only reason I am for it is because I want the community to help me with this project and the coming one. Ohhhh, you will love the next project. It’s not a library anymore but a whole studio…dreams come true. I make them coming true on my end, but I need help. For crying out loud.
I, hereby, am asking from the community, the people out there, that do this stuff for fun or profit to help, to contribute back to the community. I can’t do it all myself, it will take years. Though I am willing to do it myself, and that’s what I do. But then WHAT FOR?! So some companies can enjoy my work and get money on my expense? NO THANKS.
Community, Community, Community, this is key, what the whole issue is about. You are saying you need open source tools, no problem, but share. Let’s do it together. Parsing strings is not the way. Finally we got a good weapon and let’s build a framework on top of it. We got Olly, we got WinDbg, and we got IDA. None of them is opened source. Each has a clear win in its niche. I think there is a room, a NEED, for something new, free and open source. If I am not going to get help this time, I am not going to open source the next project, because it’s pointless and by now you should know exactly why.
You know what, maybe it’s all my fault. Maybe diStorm’s code is TOO complex. (What do you expect though?). Maybe the stupid and ugly diStorm’s page is not easy to track. Otherwise why I see someone who spent a few hours taking a disassembler out of a bigger project and make it work for Windows kernel, while you can do that in diStorm in two clicks?
Maybe nobody gives a f@ck about it, probably, just a stupid disassembler. But no more. This is the time to make a change, a big one. It might be a disassembler, or anything else, but it just shows the attitude of the community.And don’t kid yourself, everybody looks for code analysis stuff, or eventually write them up on their own. C’est tout. Mailling list or not, if we are not going to help each other and only complain that there are not enough open source projects in the RE field, nothing good is gonna happen.
GOOD LUCK!
Gil Dabah, Arkon
P.S – You can start by forwarding a link to this post.
Yo yo yo… forgot to say happy xmas last time, never too late, ah?
This time I wanted to update you about diStorm3 once again. Yesterday I had a good coding session and I added some of the new features regarding flow control. The decode function gets a new parameter called ‘features’. Which is a bit field flag that lets you ask the disassembler to do some new stuff such as:
I wasn’t sure about SYSCALL and the like and UD2, for now I left them out. So what we got now is the ability to instruct the disassembler to stop decoding after it encounters one of the above conditions. This makes the higer disassembler layer more efficient, because now you can disassemble code by basic blocks. Also building a call-graph or branches-graph faster.
Note that now you will be able to ask the disassembler to return a single instruction. I know it sounds stupid, but I talked about it already, and I had some reasons to avoid this behavior. Anyway, now you’re free to ask how many instructions you want, as long as the disassembler can read them from the stream you supply.
Another feature added is the ability to filter non-flow-control instructions. Suppose you are interested in building a call-graph only, there’s no reason that you will get all the data-control instructions, because they are probably useless for the case. Mixing this flag with ‘Stop on RET’ and ‘Stop on CALL’, you can do nice stuff.
Another thing is that I separated the memory-indirection description of an operand into two forms. First of all, memory indirection operand is when an instruction reads/writes from/to memory. Usually in Assembly text, you will see the brackets characters surrounding some expressions. Something like: MOV [EDX], EAX. Means we write a DWORD to EDX pointer. If you followed me ’till here, you should know exactly what I’m talking about anyway.
When you get the result of such instruction from diStorm3, the type of the operand will be SMEM (stands for simple-memory), which hints there’s only one register in the memory-indirection operand. Although it doesn’t hint anything about the displacement, that’s that offset you usually see in the brackets. Like MOV [EDX+0x12345678], EAX. So you will have to test if the displacement exists in both forms. The other form is MEM (Normal memory indirection, or probably should be called ‘complex’) since it supports the full memory indirection operand, like: MOV [EAX*4 + ESI + 0x12345678], EAX. Then you will have to read another register that supplies the base register, in addition to the index register and scale. Note that this applies for 16 bits mode addressing as well, when you can have a mix of ['BX+SI]‘ or only ‘[BX]‘. Also note that sometimes in 32/64 bits mode, you can have a SIB byte, that sets only the base register and the index register is unused, but diStorm3 will return it as an SMEM, to simplify matters. This way it’s really easy to read the instruction’s parameters.
Another feature for text formatting is the ability to tell the disassembler to limit the address to 16 or 32 bits. This is good since the offsets are 64 bits today. And if you encounter an instruction that jumps backwards, you will get a huge negative value, which won’t make much sense if you disassemble 16 bits code…
diStorm3 still supplies the bad old interface. And now it supports two new additional functions. The decompose function, which returns the structures for each instruction read. And another function that formats a given structure into text, which is pretty straight forward. The text format is not an accurate behavior of diStorm64, it’s more simplified, but good enough. Besides I have never heard any special comments about the formatting of diStorm64, so I guess it doesn’t matter much to you guys. And maybe maybe I will add AT&T syntax later on.
Another field that is returned now, unlike diStorm64, is the instruction-set-class type of the instruction, with very broad categories, like Integer instructions, FPU instructions, SSE instructions, and so on. Still might be handy. And the hint about the flow-control type of the instruction.
Also I changed tons of code, and I really mean it, the skeleton is still the same, but the prefixes engine works totally different now. Trying to imitate a real processor this time. By including the last prefix found of that prefix-type. You can read more about this, here. I made the code way more optimized and eliminated double code and it’s still readable, if not for the better. Also I changed the way instruction are fetched, so the locate-instruction function is much smaller and better.
I’m pertty satisfied with the new version of diStorm and hopefully I will be able to share it with you guys soon. Still I got tons of tests to do, maybe I will add that unit-test module in Python to the proejct so you can enjoy it too, not sure yet.
Also I got a word from Mario Vilas, that he is going to help with compiling diStorm for different platforms, and I’m going to integrate his new Python wrappers that use ctypes, so you don’t need the Python extension anymore. Thanks Mario!
However, diStorm3 has its own Python module for the new structure output.
If you have more ideas, comments, complaints or you just hate me, this is the time to say so.
Cheers, happy new year soon!
Gil
I’ve slowed down with posting here, been working hard recently both on diStorm and real life job. Hopefully I will be able to release diStorm3 soon (matter of weeks for a start). I’m almost finished with coding everything I wanted, though I’m still left with the features you guys asked for and tons of tests since I also changed lots of code and added new AVX instruction set.
In the last few week I’ve been working on diStorm to add the new instruction sets: AVX and FMA. You can find lots of information about them on the inet. In a brief, the big advantages, are support of 256 bit registers, called now YMM’s (their low halves are XMM’s) and also support for AES built in, you have a few instruction to do small block encryption and decryption, really sweet. I guess it will help some security companies out there to boost stuff. Also the main feature behind these instruction sets is the 3 registers operands. So now you are not stuck with 2 registers per instruction, you can have up to four sometimes. This is good because you have two source operands and a destination operand, which means it saves you other instructions (to move or backup registers) and you don’t have to ruin your dest-src operand like in the old sets. Almost forgot to mention FMA itself, which is fused multiply-add instructions, so you can do two operations at once, like A*B+C, etc.
I wanted to talk about the VEX prefix itself. It’s really a new design and approach to prefixes that was never seen before. And the title of this post says it all, it’s really annoying.
The VEX (Vector Extension) prefix, is a multi-byte prefix for a change. It can be either 2 or 3 bytes. If you take a look at the one byte opcodes map, all of them are taken. Intel was in the need of a new unused byte, which didn’t really exist. What they did instead, was to share two existing opcodes, for each prefix. The sharing works in a special way, that let them know if you meant to use the original instruction or the new prefix. The chosen instructions are LDS (0xc4) and LES (0xc5). When you examine the second byte of these instructions, the byte upon which the new information is extracted from, you can learn that the most significant two bits can’t be set together (I.E: the value of 0xc0 or higher). However, if they are set, the processor will raise an illegal instruction exception. This is where the VEX prefix enters into the game. Instead of raising an exception they will be decoded as this special multi-byte prefix. Note that in 64 bits mode, all those Load-Segment instruction are invalid, so there is no need for sharing the opcode. When you encounter 0xc4 or 0xc5, you know it’s a VEX prefix, as simple as that. Unfortunately this is not the case in 32 bits mode, and since the second byte has to be with a value higher than 0xc0 (because the two most significant bits have to be set in 32 bits), the field in these corresponding bits is inverted actually, which means you will have to extract a few bits that represent some fields and bitwise-not them. This is seriously gross, but it seems Intel didn’t have much of a choice here. If it were up to me, I would do the same eventually, for the sake of backward compatibility, but it doesn’t make it any prettier to be honest. And for your information, AMD pulled the same trick but with the POP instruction (0×8f) for their new instruction sets (XOP, etc), without full backward compatibility.
To make some order in the bits let’s have a look at the following figure:

Cited Intel
Well, I am not going to talk about all fields, (which somewhat are similar to REX for 64 bits) but just about one interesting feature. Since now the prefix is 2/3 bytes and usually an SSE instruction is at least 3 bytes, this will explode the code segment with huge instruction, and certainly gonna make the processor cry a lot to fetch instructions. The trick that was used by Intel (and AMD too) was to have a field that will imply which prefix byte to put virtually before the VEX prefix itself, so this way we saved one byte. And the same idea was used again to spare the 0×0f escape byte or even two bytes of 0×0f, 0×38 or 0×0f, 0×3a which are very common basic opcodes for SSE instructions. So if we had to use an SSE instruction, for instance:
66 0f 38 17 c0; PTEST XMM0, XMM0 – has first 3 bytes that can be implied in the VEX prefix, thus it stays the same size! Kawabanga
I talked in an earlier post that I am not going to support SSE5 as for now, in the hope it’s gonna die. I believe CPU (or instruction set architectures, to be accurate) wars are bad for the coders and even for end users who can’t really enjoy those great technologies eventually.
The BSWAP instruction is very handy when you want to convert a big endian value to a little endian value and vice versa. Instead of reversing the bytes yourself with a few moves (or shifts, depends how you implement it), a single instruction will do it as simple as that. It personally reminds me something like the HTONS (and friends) in socket programming. The instruction supports 32 bits and 64 bits registers. It will swap (reverse) all bytes inside correspondingly. But it won’t work for 16 bits registers. The documentation says the result is undefined. Now WTF, what was the issue to support 16 bits registers, seriously? That’s a children’s game for Intel, but instead it’s documented to be undefined result. I really love those undefined results, NOT.
When decoding a stream in diStorm in 16 bits decoding mode, BSWAP still shows the registers as 32 bits. Which, I agree can be misleading and I should change it (next ver). Intel decided that this instruction won’t support 16 bits registers, and yet it will stay a legal instruction, rather than, say, raising an exception of undefined instruction, there are known cases already when a specific destination register can cause to an undefined instruction exception, like MOV CR5, EAX, etc. It’s true that it’s a bit different (because it’s not the register index but the decoding mode), but I guess it was easier for them to keep it defined and behave weird. When I think of it again, there are some instruction that don’t work in 64 bits, like LDS. Maybe it was before they wanted to break backward compatibility… So now I keep on getting emails to fix diStorm to support BSWAP for 16 bits registers, which is really dumb. And I have to admit that I don’t like this whole instruction in 16 bits, because it’s really confusing, and doesn’t give a true result. So what’s the point?
I was wondering whether those people who sent me emails regarding this issue were writing the code themselves or they were feeding diStorm with some existing code. The question is how come nobody saw it doesn’t work well for 16 bits? And what’s the deal with XCHG AH, AL, that’s an equal for BSWAP AX, which should work 100%. Actually I can’t remember whether compilers generate code that uses BSWAP, but I am sure I have seen the instruction being used with normal 32 bits registers, of course.
[Update - diStorm3 News]
I have been working more and more on diStorm3 recently. The core code is already written, and it works so great. I am still not going to talk about the structure itself that diStorm uses to format the instructions. There are two API’s now, the old one, which takes a stream and formats it to text and a newer one, which takes a stream and formats it into structures. This one is much faster. Unlike diStorm64, where the text-formatting was coupled in the decoding code, it’s totally separated. For example, if you want to support AT&T syntax, you can do it in a couple of hours or less, really. I don’t like AT&T syntax, hence I am not going to implement it. I bet still many people don’t know how to read it without confusing…
Hereby, I am asking you guys to come up with ideas for diStorm3. So far I got some new ideas from people, which I am going to implement. Such as:
1) You will be able to tell the decoder to stop on any flow control instruction.
2) Instructions are going to be categorized, such as, flow-control, data-control, string instructions, io, etc. (To be honest, I am still not totally sure about this one).
3) Helper macros to extract data references. Since diStorm3 outputs structures, it’s really easy to know if there’s a data reference and its address. Therefore some macros will aid to do this work.
4) Code reference, – continues to next instruction, continues to a target according to a condition, or jump-always and call-always.
I am looking to hear more suggestions from you guys. Please be sure you are talking about disassembler features, and not other layers which use the disassembler.
Just wanted to let you know that diStorm3 is going to be dual licensed with GPL and commercial. diStorm64 is deprecated and I am not going to touch it anymore, though it’s still licensed as BSD, of course.
Yeah, I am not alright, I am even an ass, leaving you all (my dear readers, are there any left?) without saying a word for the second time. At the end of last year I was in South East Asia for 3 months, and now I am in South America for 3 months and counting… It is just that I really wish to keep this blog totally technological, but I guess we are all human after all. So yeah, I have been trekking alot in Chile and Argentina (mostly in Patagonia) and having a great time here, now in Buenos Aires. Good steaks and wines, ohhh and the girls. Say no more.
It is really cool that almost every shitty hostel you go, you will find a WiFi available for free use. So carrying an Ipod touch with me I can actually be online, but apparently not many web developers think about Mobile web pages and thus I couldnt write blog posts with Safari, because there is some problem with the text area object. For some reason, I guess some JS code, doesnt run well on the Ipod and I dont get that keyboard thingy up and cannot type in anything, wordpress…
I am always surprised again to see how many computers here, in coffee-shops or just Internet shops, are not really secured. You run as admin some of the times. And there are not anti virus, which I think are good for the average users. And if you plug in your camera to upload some pictures, the next time you will see some stupid new files on it, named: desktop.ini and autorun.inf, sounds familiar? And then I read some MS blog post about disabling AutoRun for removable storage devices..yipi, about time. What I am also trying to say, that one can easily create a zombies army so easily with all those computers… the ease of access and no protection drives me mad.
Anyhow, I had some free time, of course, I am on a vacation, sort of, after all. And I accidentally reached some amazing blog that I couldnt stopped reading for a few days. Meet NO EXECUTE! If you are low level freaks like me, you will really like it too, although Darek begins with hardware stuff, which will fill some gaps for most people I believe, he talks about virtualizations and emulators (his obsession), and I just read it like some fantasies book, eager to get already to the next chapter everytime. I learnt tons of stuff there, and I really like to see that even today some few people still measure and check optimizations in cycles per instructions rather than seconds or MS. One of the stuff I really liked there was a trick he pulled when the guest OS runs on little endian, for instance, and the host OS runs on big endian. Thus every access to memory has to be swapped when the size of the access is more than 2 bytes, of course. Therefore, in order to eliminate the byte swaps, which is expensive, he kinda turned all the memory of the guest OS upside down, and therefore the endianity changed as well. Now it might sound as a simple matter, but this is awesome, and the way he describes it, you can really feel the excitment behind the invention… He also talks about how lame Intel and AMD are to come up with new instruction sets every Monday, which I already mentioned also in the past.
Regarding diStorm now, I decided that I will discontinue the development of the current diStorm64 version. But hey, dont worry. I am going to open source diStorm3 and I still consider making it dual licensed. The benefits of diStorm3 are structure output, and believe me, the speed is amazing and like the good old days, the structure per instruction is unbelieable tiny in size (relative to other disassemblers I saw out there), and you guys are gonna like it.
Thing is, I have no idea when I am getting home…Now with this Swine Flu spreading like hell, I dont know where I will end up. The only great thing about this Swine Flu, so to speak, is that you can see the Evolution in Progress.
Salud
Since the first day diStorm was out people didn’t know how to deal with the fact that I drop(ignore) some prefixes. It seems that dropping unused prefixes isn’t such a great feature for many people and it only complicates the scanning of streams. Therefore I am thinking about removing the whole mechanism, or maybe change it in a way that still preserves the same interface but behaves differently.
For the following stream: “67 50″, the result by diStorm will be: “db 0×67″ – “push eax”. The 0×67 prefix supposes to change the address size, which none is used in our case, thus it’s dropped. However, if we look at the hex code of the “push eax” part we will see “67 50″. And this is where most of the people become dumbfounded. Getting twice the same prefix-byte of the stream in two results is in a way confusing. Taking a look at other disassemblers will tell you that diStorm is not the only one to do such games with prefixes. Sometimes I get emails regarding this “impossible” prefix – since it gets to be output twice, which is wrong, right? Well, don’t know, it depends how you choose to decode it. The way I chose to decode prefixes was really advanced, each prefix could have been ignored, unless it has really affected (one of) the operand itself. I had to really keep tracking on each prefix and know whether it affected any operands in the instructions and only then I examined which prefixes I drop or not. This all sounds right in a way. Hey, at least for me.
However, we didn’t even talk about what you will do if you have multiple prefixes of the same family (segment-overide: DS, ES, SS, etc). Now this one is really up to interpretations of the designer. Probably the way I did it in diStorm is wrong, I admit it, that’s why I want to rewrite the whole prefixes thing from the beginning. There are 4 or 5 types of prefixes and according to the specs (Intel/AMD) I quote: “A single instruction should include a maximum of one prefix from each of the five groups.” …. “The result of using multiple prefixes from a single group is unpredictable.”. This pretty much sums all the problems in the world related to prefixes. I guess you can see for yourself from these 2 lines you can actually treat them in many different ways. We know now that it can lead to “unpredictable” results if you have many prefixes – in reality it won’t shut down your CPU, it won’t even throw an exception. So screw it you say, and you’re right. Now let’s see some CPU (16 bits) logic for decoding the prefixes:
while (prefix byte is read) {
switch (prefix): {
case seg_cs: use_seg = cs; break;
case seg_ds: use_seg = ds; break;
case seg_ss: use_seg = ss; break;
….
….
case op_size: op_size = 32; break;
case op_addr: op_addr = 32; break;
case rep_z: rep = z; break;
…
}
- skip byte in stream -
}
The processor will use those flags in order to know which prefix was presented or not. The thing about using a loop (in any form) is that now that you have to show text out of some streams with many prefixes, you don’t know whether the processor really uses the first occurrance of the prefix or its last, or maybe both? And maybe Intel and AMD implement it differently?
You know what? Why the heck do I bother so much with some minor end cases that never really happen in real code sections. I ask myself too, maybe I shouldn’t. Although I happened to see for myself some malware code that tries to screw up the disassembler with many extra prefixes, etc.. and I thought diStorm could help malware analyzers as well with advanced prefixes decoding.
Anyways, according to the above logic code I’m supposed to use the last prefix of each type. Given a stream such as: 66 66 67 67 40. I will get:
0: 66 (dropped)
2: 67 (dropped)
1: 66 67 40
Now you can see that the prefixes used are the second and the fourth and that the instruction starts at the second byte on the stream. Now I officially can commit a suicide, even I can’t follow these addresses, it’s hell. So any better solution?