diStorm3 – Call for Features

[Update diStorm3 News]

I have been working more and more on diStorm3 recently. The core code is already written, and it works so great. I am still not going to talk about the structure itself that diStorm uses to format the instructions. There are two API’s now, the old one, which takes a stream and formats it to text and a newer one, which takes a stream and formats it into structures. This one is much faster. Unlike diStorm64, where the text-formatting was coupled in the decoding code, it’s totally separated. For example, if you want to support AT&T syntax, you can do it in a couple of hours or less, really. I don’t like AT&T syntax, hence I am not going to implement it. I bet still many people don’t know how to read it without confusing…

Hereby, I am asking you guys to come up with ideas for diStorm3. So far I got some new ideas from people, which I am going to implement. Such as:
1) You will be able to tell the decoder to stop on any flow control instruction.
2) Instructions are going to be categorized, such as, flow-control, data-control, string instructions, io, etc. (To be honest, I am still not totally sure about this one).
3) Helper macros to extract data references. Since diStorm3 outputs structures, it’s really easy to know if there’s a data reference and its address. Therefore some macros will aid to do this work.
4) Code reference, – continues to next instruction, continues to a target according to a condition, or jump-always and call-always.

I am looking to hear more suggestions from you guys. Please be sure you are talking about disassembler features, and not other layers which use the disassembler.

Just wanted to let you know that diStorm3 is going to be dual licensed with GPL and commercial. diStorm64 is deprecated and I am not going to touch it anymore, though it’s still licensed as BSD, of course.

16 Responses to “diStorm3 – Call for Features”

  1. Ange says:

    not LGPL at least? pity.

    Anyway distorm64 is good so best luck for distorm3.

    XOP support planned ?

    will be nice to turn Call $+5 into other stuff.

  2. arkon says:

    Maybe LGPL I will have to check it out. I still have lots of coding/vacation in front of me, so it takes some time.

    Not exactly XOP, but AVX will be.

    Turning Call $+5, is a higher’s level work. One day I will release my other projects…

  3. developer says:

    Oh, great news!

    It would be fine to see the “stream to structure” version asap (alpha, beta, other?).
    Expected features:
    + decode single instruction, like
    /* ContextPtr – stream decoding options
    * Stream – byte stream
    * InstructionStruct – see below
    * return: 0 on error, decoded length otherwise.
    */
    int DecodeSingleInstruction (ContextPtr, Stream, InstructionStruct);

    + at least static lib and dll version (to use with managed code, for example)

  4. arkon says:

    I don’t like the single instruction thingy, cause then everybody will use it, instead of decoding the stream in bulks. Since you know it’s a stream of legitimate code (most of the times, at least) you will waste time on single instruction decoding. This is my way to enforce things, although pretty lame, I know.

    You can already compile diStorm64 for .lib and .dll files (or .so/.ar)…

  5. Tsuda Kageyu says:

    What fields does the structures that decoder returns have?
    I hope they can tell us their addressing mode (like direct, register indirect, register based indirect, etc.) or the helper macros can do it.

  6. jamie fenton says:

    I have been working on a program that dumps RAM to video called HayWire and want to add a machine code rendering mode that represents instructions as pictographs and emphasizes the higher-order structure of programs. For example, zoomed-back you might see a symbol for a “call site”, and getting closer, see the setup instructions, closer yet, see the opcodes and operand details, and (in my dreams), get closer yet to see micro-architectural details, arrays of transistors, etc.

    This sort of thing requires the type of instruction parser you are working on. The more I can find by traversing the structures you fill, the better.

    I attempt to regenerate everything 30 times a second – so speed is a core value. Recursive update – or perhaps incremental update, helps with this – although code changes relatively rarely, I would find a form of progressive refinement of results useful, so if the viewer lingers somewhere, more detail would fill in.

    I vote for LGPL too – while I am an open-source fan, I can’t give the whole show away – for example, as a “recreational debugger”, I need to automatically redact information that the end-user may not realize is a security issue from dump files they generate and share – and I don’t want to share that list with the world;s script kiddies.

    The other reason I am keen to check out your stuff is that the x86 architecture, despite years of working with it, baffles me. Your code, as i a rewrite based on knowledge, might “reify an ontology” – or said more simply, present a useful scheme of organization.

  7. Cypherjb says:

    How about IA64 support? :P

    I know this is a huge ask and probably never gonna happen, but it’s worth a try because I don’t know of any free disassemblers that actually have IA64 support. (I’m way too lazy to write my own)

    Great job on DiStorm64 btw, it’s a huge help for some of the projects I work on.

  8. developer says:

    >I don’t like the single instruction thingy
    Yeah, but sometimes it is better to use it instead of bulk decoding. For example, over obfuscated code.

    So, single instructions decoding is required feature.

  9. arkon says:

    Ok I will support single instruction decoding. Btw the reason it wasn’t allowed really is because you might lose sync when decoding exceeding prefixes for the same instruction. But now they are all attached to next instruction. So this problem disappears. It might be a problem when you have 15 prefixes in a row. Oh well..

    Cypher – you were talking about itanium or what?

    Jamie – It’s not going to be lgpl since then it means it can be fully used by commercial products, which is exactly what I want to limit.

  10. Qages says:

    I’m looking for a feature that can assemble instructions into it’s byte code. my compiler lacks 64bit inline assembly, was thinking you could make distorm assemble, and then i could just put it in memory and call it when i needed from my c++ code.

  11. arkon says:

    Why not use a real assembler library like YASM ?
    diStorm is never going to assemble instructions.

  12. Qages says:

    I’,m not asking for a full fledged assembler, i just need a small inline & *runtime* assembler to assemble small bits of code.

  13. jamie fenton says:

    I presume your objection to LGPL has to do with its ability to be used as a loophole to avoid negotiating a commercial license with you rather than coming from a general objection to commercial use of your creations?

    If so, then the next question is: What are your terms? (As an interdependent software developer who doesn’t have a publication contract yet, they need to be reasonable. As the customer here, I want to nail down the lowest price possible, immediately, where as the supplier, you want to determine how much your part contributes to the overall value of what I make and capture a fair measure of that.

    Deferred too long, the situation can become a due-diligence problem where the customer needs clearance on an IP issue and the supplier can withhold permission by demanding an exorbitant payment, and my rewrite around your API becomes dubious since much evidence of “derivation” remains.

    I apologize for raising “suit and tie” issues on a technical blog, but how we handle (or avoid) “Lawyer Exceptions” will be critical to ecosystem viability.

  14. anonymous says:

    How about basing the input for it on mazegen’s x86asm xml reference? :D

  15. arkon says:

    Last time I tried to talk to that guy, he didn’t answer me, so I don’t know what’s going on with that project.
    Anyway, you are welcome to contribute code that takes the current output and reformats it to XML.
    That’s truely a handy feature!

  16. sfinktah says:

    RSP / SP / stack delta tracking… because Ida refuses to do it outside of a function.

Leave a Reply