{"id":73,"date":"2008-12-21T06:27:32","date_gmt":"2008-12-21T08:27:32","guid":{"rendered":"http:\/\/www.ragestorm.net\/blogs\/?p=73"},"modified":"2008-12-21T06:33:01","modified_gmt":"2008-12-21T08:33:01","slug":"instructions-prefixes-hell","status":"publish","type":"post","link":"https:\/\/www.ragestorm.net\/blogs\/?p=73","title":{"rendered":"Instructions&#8217; Prefixes Hell"},"content":{"rendered":"<p>Since the first day diStorm was out people didn&#8217;t know how to deal with the fact that I drop(ignore) some prefixes. It seems that dropping <em>unused<\/em> prefixes isn&#8217;t such a great feature for many people and it only complicates the scanning of streams. Therefore I am thinking about removing the whole mechanism, or maybe change it in a way that still preserves the same interface but behaves differently.<\/p>\n<p>For the following stream: &#8220;67 50&#8221;,\u00a0the result by diStorm will be: &#8220;db 0x67&#8221; &#8211; &#8220;push eax&#8221;. The 0x67 prefix supposes to change the address size, which none is used in our case, thus it&#8217;s dropped. However, if we look at the hex code of the &#8220;push eax&#8221; part we will see &#8220;67 50&#8221;. And this is where most of the people become dumbfounded. Getting twice the same prefix-byte of the stream in two results is in a way confusing. Taking a look at other disassemblers will tell you that diStorm is not the only one to do such games with prefixes. Sometimes I get emails regarding this &#8220;impossible&#8221; prefix &#8211; since it gets to be output twice, which is wrong, right? Well, don&#8217;t know, it depends how you choose to decode it. The way I chose to decode prefixes was really advanced, each prefix could have been ignored, unless it has really <strong>affected<\/strong> (one of) the operand itself. I had to really keep tracking on each prefix and know whether it affected any operands in the instructions and only then I examined which prefixes I drop or not. This all sounds right in a way. Hey, at least for me.<\/p>\n<p>However, we didn&#8217;t even talk about what you will do if you have multiple prefixes of the same family (segment-overide: DS, ES, SS, etc). Now this one is really up to interpretations of the designer. Probably the way I did it in diStorm is wrong, I admit it, that&#8217;s why I want to rewrite the whole prefixes thing from the beginning. There are 4 or 5 types of prefixes and according to the specs (Intel\/AMD) I quote: &#8220;A single instruction should include a maximum of one prefix from each of the five groups.&#8221; &#8230;. &#8220;The result of using multiple prefixes from a single group is unpredictable.&#8221;.\u00a0This pretty much sums all the problems in the world related to prefixes. I guess you can see for yourself from these 2 lines you can actually treat them in many different ways. We know now that it can lead to &#8220;unpredictable&#8221; results if you have many prefixes &#8211; in reality it won&#8217;t shut down your CPU, it won&#8217;t even throw an exception. So screw it you say, and you&#8217;re right. Now let&#8217;s see some CPU (16 bits)\u00a0logic for decoding the prefixes:<\/p>\n<p>while (prefix byte is read) {<br \/>\n\u00a0switch (prefix): {<br \/>\n\u00a0 case seg_cs:\u00a0use_seg = cs; break;<br \/>\n\u00a0 case seg_ds: use_seg = ds; break;<br \/>\n\u00a0 case seg_ss: use_seg = ss; break;<br \/>\n\u00a0\u00a0&#8230;.<br \/>\n\u00a0 &#8230;.<br \/>\n \u00a0case op_size: op_size = 32; break;<br \/>\n\u00a0 case op_addr: op_addr = 32; break;<br \/>\n \u00a0case rep_z: rep = z; break;<br \/>\n \u00a0&#8230;<br \/>\n\u00a0}<br \/>\n\u00a0&#8211; skip byte in stream &#8211;<br \/>\n}<\/p>\n<p>The processor will use those flags in order to know which prefix was presented or not. The thing about using a loop (in any form) is that now that you have to show text out of some streams with many prefixes, you don&#8217;t know whether the processor <strong>really<\/strong> uses the first occurrance of the prefix or its last, or maybe both? And maybe Intel and AMD implement it differently?<\/p>\n<p>You know what? Why the heck do I bother so much with some minor end cases that never really happen in real code sections. I ask myself too, maybe I shouldn&#8217;t. Although I happened to see for myself some malware code that tries to screw up the disassembler with many extra prefixes, etc.. and I thought diStorm could help malware analyzers as well with advanced prefixes decoding.<\/p>\n<p>Anyways, according to the above logic code I&#8217;m supposed to use the last prefix of each type. Given a stream such as: 66 66 67 67 40. I will get:<br \/>\n0: 66 (dropped)<br \/>\n2: 67 (dropped)<br \/>\n1: 66 67 40<br \/>\nNow you can see that the prefixes used are the second and the fourth and that the instruction starts at the second byte on the stream. Now I officially can commit a suicide, even I can&#8217;t follow these addresses, it&#8217;s hell. So any better solution?<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Since the first day diStorm was out people didn&#8217;t know how to deal with the fact that I drop(ignore) some prefixes. It seems that dropping unused prefixes isn&#8217;t such a great feature for many people and it only complicates the scanning of streams. Therefore I am thinking about removing the whole mechanism, or maybe change [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","jetpack_publicize_message":""},"categories":[21,5,3],"tags":[],"jetpack_featured_media_url":"","jetpack_publicize_connections":[],"jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/pbWKd-1b","_links":{"self":[{"href":"https:\/\/www.ragestorm.net\/blogs\/index.php?rest_route=\/wp\/v2\/posts\/73"}],"collection":[{"href":"https:\/\/www.ragestorm.net\/blogs\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.ragestorm.net\/blogs\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.ragestorm.net\/blogs\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.ragestorm.net\/blogs\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=73"}],"version-history":[{"count":0,"href":"https:\/\/www.ragestorm.net\/blogs\/index.php?rest_route=\/wp\/v2\/posts\/73\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.ragestorm.net\/blogs\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=73"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.ragestorm.net\/blogs\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=73"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.ragestorm.net\/blogs\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=73"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}