Archive for the ‘Python’ Category

diStorm3 is Ready

Monday, August 16th, 2010

diStorm3 is ready for the masses! :)
- if you want to maximize the information you get from a single instruction; Structure output rather than text, flow control analysis support and more!

Check it out now at its new google page.

Good luck!

Context of Execution (Meet Pyda)

Tuesday, August 5th, 2008

While I was writing an interactive console Python plugin for IDA pro I stumbled in a very buggy problem. The way a plugin works in IDA is that you supply 3 callbacks, init, run, term. You can guess which is called when. I just might put in that there are a few types of plugins and each type can be loaded in different times. I used, let’s say, a UI plugin which gets to run when you hit a hotkey. At that moment my run() function executes in the context of IDA’s thread. And only then I’m allowed to use all other data/code from IDA’s SDK. This is all good so far, and as it should be from such a plugin interface. So what’s wrong? The key idea behind my plugin is that it is an interactive Python console. Means that you have a console window which there you enter commands which eventually will be executed by the Python’s interpreter. Indirectly, this means that I create a thread with a tight loop that waits for user’s input (it doesn’t really matter how it’s being done at the moment). Once the user supplies some input and presses the enter key I get the buffer and run it. Now, it should be clear that the code which Python runs is in that same thread of the console, which I created. Can you see the problem? Ok, maybe not yet.

Some of the commands you can run are wrappers for IDA commands, just like the somewhat embedded IDC scripting in IDA, you have all those functions, but this time in Python. Suppose that you try to access an instruction to get its mnemonic from Python, but this time you do it from a different thread while the unspoken contract with IDA plugin is that whatever you do, you do it in run() function time, and unforetunately this is not the case, and cannot be the case ever, since it is an interactive console that waits for your input and should be ready anytime. If you just try to run SDK commands from the other (console) thread then you experience mysterious crashes and unexplained weird bugs, which is a big no no, and cannot be forgiveable for a product like this, of course.

Now that we know the problem we need to hack a solution. I could have done a very nasty hack, that each time the user runs something in Python, I simulate the hotkey for the plugin and then my plugin’s run() will be called from IDA’s context and there I will look at some global info which will instruct me what to run, and later on I need to marshal back the result to the Python’s (console) thread somehow. This is all possible, but com’on this is fugly.

I read the SDK upside down to find some way to get another callback, which I can somehow trigger whenever I want, in order to run from IDA’s context, but hardly I found something good and at the end nothing worked properly and it was too shakey as well.

Just to note that if you call a function from the created thread, most of the times it works, but when you have some algorithm running heavily, you get crashes after a short while. Even something simple like reading some info of an instruction in a busy loop and at the same time scrolling the view might cause problems.

Then I decided it’s time to do it correctly without any hacks at all. So what I did was reading in MSDN and looking for ways to run my own code at another thread’s context. Alternatively, maybe I could block IDA’s thread while running the functions from my own thread, but it doesn’t sound too clever, and there’s no really easy way to block a thread. And mind you that I wanted a simple and clean solution as much as possible, without crafting some Assembly code, etc. ;)

So one of the things that jumped into my mind was APC, using the function QueueUserAPC, which gets among other params an HTHREAD, which is exactly what I needed. However, the APC will get to run on the target thread only when it is in an alertable state. Which is really bad, because most UI threads are never in that state. So off I went to find another idea. Though this time I felt I got closer to the solution I wanted.

Later on a friend tipped me to check window messages. Apparently if you send a message to another window, which was created by the target thread (if the thread is a UI one and has windows, of course we assume that in IDA’s case and it’s totally ok), the target thread will dispatch that window’s window-procedure in its own context. But this is nothing new for me or you here, I guess. The thing was that I could create a window as a child of IDA’s main window and supply my own window-procedure to that window. And whenever a message is received at this window, its window-procedure gets to run in IDA’s thread’s context! (and rather not the one which created it). BAM mission accomplished. [Oh and by the way, timers (SetTimer) work the same way, since they are implemented as messages after all.]

After implementing this mechanism for getting to run in original’s IDA’s context, just like the contract wants, everything runs perfectly. To be honest, it could easily have not worked because it’s really dependent on the way IDA works and its design, but oh well. :)

About this tool, I am going to release it (as a source code) in the upcoming month. It is called Pyda, as a cool abbreviation for Python and IDA. More information will be available when it is out.

Signed Division In Python (and Vial)

Friday, April 25th, 2008

My stupid hosting company decided to move the site to a different server blah blah, the point is that I lost some of the recent DB changes and my email hasn’t been working for a week now :(

Anyways I repost it. The sad truth was that I had to find the post in Google’s cache in order to restore it, way to go.

Friday, April 18th, 2008:

As I was working on Vial to implement the IDIV instruction, I needed to have a signed division operator in Python. And since the x86 is a 2’s complement based, I first have to convert the number into Python’s negative (from unsigned) and only then make the operation, in my case a simple division. It was supposed to be a matter of a few minutes to code this function which gets the two operands of IDIV and return the result, but in practice it took a few bad hours.

The conversion is really easy, say we mess with 8 bits integers, then 0xff is -1, and 0×80 is -128 etc. The equation to convert it to a Python’s negative is: val – (1 << sizeof(val)*8). Of course, you do that only if the most significant bit, sign bit, is set. Eventually you return the result of val1 / val2. So far so good, but no, as I was trying to feed my IDIV with random input numbers, I saw that the result my Python’s code returns is not the same as the processor’s. This was when I started to freak out. Trying to figure out what’s the problem with my very simple snippet of code. And alas, later on I realized nothing was wrong with my code, it’s all Python’s fault.

What’s wrong with Python’s divide operator? Well, to be strict, it does not round the negative result toward 0, but towards negative infinity. Now, to be honest, I’m not really into math stuff, but all x86 processors rounds negative numbers (and positive also to be accurate) toward 0. So one would really assume Python does the same, as would C, for instance. The simple case to show what I mean is: 5/-3, in Python results in -2. Rather than -1, as the x86 IDIV instruction is expected and should return. And besides -(5/3) is not 5/-3 in Python, now it’s the time you say WTF. Which is another annoying point. But again, as I’m not a math guy, though I was speaking with many friends about this behavior, that equality (or to be accurate, inequality) is ok in real world. Seriously, what we, coders, care about real world math now? I just want to simulate a simple instruction. I really wanted to go and shout “hey there’s a bug in Python divide operator” and how come nobody saw it before? But after some digging, this behavior is really documented in Python. As much as I would hate it and many other people I know, that’s that. I even took a look at the source code of the integer division algorithm, and saw a ‘patch’ to fix the numbers to be floored if the result is negative because of C89 doesn’t define the rounding well enough.

While you’re coding something and you have a bug, you usually just start debugging your code and track it down and then fix it easily while keeping on working on the code. Because you’re in the middle of the coding phase. There are those rare times that you really get crazy when you’re absolutely sure your code is supposed to work (which it does not) and then you realize that the layer you should trust is broken (in a way). Really you want kill someone  … being a good guy I won’t do that.

Did I hear anyone say modulo?? Oh don’t even bother, but this time I think that Python returns the (math) expected result rather than the CPU. But what does it matter now? I really want only to imitate the processor’s behavior. So I had to hack that one too.

The solution after all, was to make the Python’s negative number to be absolute and remember its original sign, that we do for both operands. And then we make an unsigned division and if the signs of the input are not the same we change the sign of the result. This is because we know that the unsigned division works as the processor does and we can then use it safely.

res = x/y; if (sign_of_x != sign_of_y) res = -res;

The bottom line is that I really hate this behavior in Python and it’s not a bug, after all. I’m not sure how many people like me encountered this issue. But it’s really annoying. I don’t believe they are going to fix it in Python 3, never know though.

Anyway, I got my IDIV working now, and that was the last instruction I had to cover in my unit tests. Now It’s analysis time :)

Embedded Python And raw_input

Monday, February 25th, 2008

If you ever messed with embedded Python you would know that you can’t use the Win32 Console so easily. For that you will have to hack around a few bits in order to bind the stdin and stdout with the Win32 ones. Another familiar technique is to override sys.stdout with your own class that implements the ‘write’  method. So every time something goes to output this method is really called with an argument which is then being printed on the screen with whatever Win32 API you want, to name WriteFile

This same technique works also for stderr which is used when one writes some invalid text which doesn’t compile and an exception is thrown.

‘Hooking’ both stdout and stderr we still got stdin out of the loop. Now all these years that embedded Python was around I have never encountered the problem that the stdin doesn’t work with raw_input(). Don’t be confused, the way things work in embedded Python, it doesn’t mean that the interactive interpreter is fed from stdin, it’s a bit more messy. In all examples of embedded Python you have to implement your own interactive interpreter on your own, therefore you bypass stdin from the first moment. Hence if you never used raw_input you will not know if it’s broken or not. So I want to thank a special friend for reporting this bug to me… ;) J

The fix… To fix the problem I tried to replace sys.stdin with my own implementation of a class with a ‘read’ method (by guessing-because ‘write’ is stdout’s). Doing so I found out that the method name should be ‘readline’ according to raw_input’s shouts at me. Then I renamed the method and the implementation was to call a function that I wrote in C and which is wrapped in Python that reads input from the console (Using ReadFile(GetStdHandle(STD_INPUT_HANDLE),…) and returns that input string rawly. That did the trick and now raw_input works again.

I am pretty sure that all embedded Python users out there will need this hack if they already hack stdout…

Let Me Import

Wednesday, December 19th, 2007

The problem is that you cannot import modules from different (relative) paths in Python. Now sometimes it’s really a must. Like when you have a generic CPU module and then a subdirectory called ‘x86′ that wants to derive that CPU class and create a ‘x86CPU’.

So I tried to mess up with __import__ but without any luck. Then I said I just want to solve this problem, don’t care how far (ugly) I go. I started with execfile, which AFAIK isn’t exactly like import, but it’s good enough for me. execfile(“../cpu.py”) didn’t work, unforetunately. I then realized that the cpu.py file imports another file from its own directory and since the execfile doesn’t do some magic with the directories the import simply failed. Bammer. Next thing I did was to add the path of the CPU module to the sys.path and retried my attempt with a success. Although it works, I don’t like the directory separator which isn’t cross-platform. And yes I said it’s a temporary solution, but I’m still sensitive to it. Usually temporary hacks that work tend to stay there, as much as I don’t like this reproach…this is how things work usually, no?

So that one was solved, quite easily, but another one arised. In the original code (when all files used to be in the same directory) when import worked I always import the module name, and not the lot of it (import * from cpu – bad habit). So all my code is accessing the CPU module with ‘CPU.’ prefix, which is good, you know the source of everything you touch. The new problem is that since I moved to use execfile this prefix is ill now and I must get rid off it.

I thought about changing the globals dictionary to the name of the module I want to execfile() and then switch it back. But it becomes too nasty, and I’m not sure whether it would work anyway. My first attempt might be ok for some of you, alas, it’s still not good enough for me.

And I think my design with the files is ok, after all it makes sense. And yes, I know I (maybe?) should have put the files in site-packages directory and the import then would work well. However, I want the files to be in a specific directory out of Python reach (in a way).

Oh by the way, the code of my attempt is:

import os, sys
sys.path.append(os.getcwd()[:os.getcwd().rfind(os.sep)])
execfile(“..” + os.sep + “cpu.py”)

Ugly something, heh?

Ok I was just going to publish this post but I had to try another thing which really solved the issue. I managed to use __import__ well.

So now it looks like:

import os, sys
sys.path.append(os.getcwd()[:os.getcwd().rfind(os.sep)])
cpu = __import__(“cpu”)

This time it solves the second problem as well – the namespace (of the module)  issue. And yet we need the new path. Can we get rid off it?

Python: Converting Signedness

Friday, November 30th, 2007

Many times I encounter Python modules that were written in C/C++ which when you use them you get to a really annoying problem. Integers that that module returns are all unsigned. That’s really a problem, because Python longs are (potentially) inifinite (they are digit-strings) AFAIK, unlike C/C++ integers that have a limit of, usually, 32 or 64 bits.

If you pass back -1 to Python and format it as unsigned long, you will get 2**32-1. Now suppose you really want to treat the number inside your Python code as signed integer, you will have to convert 2**32-1 to -1 somehow. Most solutions I saw to this problem was to use struct in the following manner:

import struct
struct.unpack(“l”, struct.pack(“>L”, 2**32-1))[0]

 Packs an unsigned long value of 0xffffffff to (signed) long -1 (using little endian, but I don’t care now about it).

You might want to use unsigned long long – that’s 64 bits integers in order to convert a bigger number. So you can do one of two things, convert your 32 bits integer to 64 bits integer by sign extending (and that’s another whole challenge) it and stuff it into unpack/pack of 64 bits, or test the size of the integer (by how many bits it takes) and call the correct unpack/pack pair.

It was then that I realized why I shouldn’t use this ugly trick. It simply doesn’t support Python’s longs. As I said earlier they are infinite and using this trick you are capped to 64 bits. So I thought of a better way, and not using any stdlib module at all, leaving it pure Python code…

The way we know to change the signedness of an integer is by negating it, which is NOTing all bits of that number and incrementing it by 1, right? (2’s complement) Well true and that should work:

(Say we work with 8 bits integers)

0xfb = -5

>>> ~0xfb
-252
>>> ~0xfb+1
-251
>>> 256-251
5
>>> 0-5
-5

The way it works is: -(256 + (~x+1)) where x is a byte. For every integer we need to scan for its most significant bit… We can do it the other way around with the following formula (x – NEXT_MSB(x)):

>>> 0xfb – 0×100
-5

This way it’s like we changed the sign bit of the integer and fixed the number as well. Both formulas can work for all integers’ sizes. But the key here is to find the MSB. I prefered to stick to the latter formula rather than the former, since it seems to be shorter. But it doesn’t really matter, both work.

So now we have to code it, and as you should know me already – in one-liner device! The first challenge is to find the MSB, something like this should suffice in C(!):

for (int i = 7; i >= 0; i–)
  if (x & (1 << i)) break;

This will work for a byte integer, and note that we must start from the end towards the start. Otherwise we won’t find the MSB but the LSB. The problem in Python is that we don’t know the size of the longs we mess with and we need to come up with a trick to find its MSB.

The lame trick I used for converting a number into decimal ASCII string, will be used here too and it goes like this:

for i in xrange(2**32):
 if (i / (1 << i)):
  break

We try to divide the input number by 2, 4, 8, 16, 32, … and when the result is 0, we know that we are out of bits. I said it’s lame because we use division, which is slow. If you got any other idea write to me please.

Another drawback is the limit of the numbers we scan, we are limited to 2**32, this is huge enough and I guess you will never reach that, or I will be dead first prolly :o . Using Erez’s trick (see here), we can make it a bit more elegant and stop as soon as the MSB was found.

I am not sure whether you noticed, but supplying an input of a negative number isn’t a smart move, we will have to check for it specifically. Eventually this is the code I came up with:

(Note that the input “bits-stream” can be any size)

def signed(n):
 return n if n < 0 else n – [i for i in (2**j if n/(2**(j-1)) else iter(()).next() for j in xrange(2**31-1))][-1]

>>> signed(0xb)
-5
>>> signed(0xfb)
-5
>>> signed(0xffffb)
-5
>>> signed(0xffffffffffffffffffffffffb)
-5L 

GZ-LN

Challenge: One-Liner For Converting a Decimal

Thursday, October 25th, 2007

Or - a one-liner device to convert a decimal number to a string, using any base, which is lower than 10. If you want to use bases which are above 10, you will have to construct a table somehow that goes from ‘0′ to ‘9′ and then continues from ‘a’ to the required base, (or you can use a static table). So suppose we are dealing with a base <= 10, we only need to convert it to ascii, so it’s pretty simple.

If you didn’t figure it out until now (and how could you?) I’m talking about Python here. There is this int() function (actually it’s a class type to be more accurate, its constructor), which converts any string to a decimal number. Say, int(‘101′, 2) will result in 5. But the opposite operation is no where to be seen.

The straight forward way is easy:

while(n > 0):
     l.append(n%BASE)
     n /= BASE
“”.join(map(str, l[::-1]))

Though, it’s an ugly way, just to show the principle. We can do it with recursion, and then we don’t need to reverse the result, by a side effect of recursion.

When I decided to write the conversion function just for the fun of it, I wanted it to not use recursion…because with recursion it’s really easy. :) So why to make our life simple when we do things for learning and sport? Besides, for some people recursion is less intuitive, althought we might argue abou it.

So here’s my first version:

“”.join([str((n/i)%b) for i in [b**j for j in xrange(31, -1, -1)]]).lstrip(‘0′)

At the beginning I use chr((n/i)%b + 0×30), because I’m used to deal with char arrays and thinking old school C code. So Kasperle came up with the str thingy, which is much better for code readability.

Anyway, I really got pissed with the idea that I have to drop all leading zero, otherwise for n=5, I will get an input of ‘00000000000000000000000000001110′, which is quite cumbersome.

One drawback is the size of integer we want to convert, as you probably guessed, this code supports 32 bit numbers, it might support any number in a jiffy… But then you will probably have to strip more zeros most of the times. ;( Enough fooling around.

What I’m really trying to achieve is to use the code to convert any sized number, without the need of any constant magic value in my one-liner.

So trying to come up with the accurate number of digits to convert in the first place is the really the bugging trick. What we really need is something like math.log. Using the log we can know the number of digits at once. But then we need to import math. Do we count ‘import’s when we say one-liner or not? Well, I will take it as No. Hardening my life without math.

“”.join([str((n/i)%b) for i in [b**j for j in xrange(math.log(n, b), -1, -1)]])

I could have used the input number for the xrange, but it won’t return ‘0′ for an input zero number. And even so, it’s kidna cheating and lame.

Technically, the solution is to generate a list with [1, 10, 100, 1000, ....]. The condition to stop is when n/entry == 0. The problem to make this list is how to generate it on the fly? :) or how to stop generating it.

Well, AFAIK in Python it’s not possible. So I’m trying to simulate log. Imri just suggested to use a rate number for a log approximation which will be base dependent. But I didn’t like that idea – magic numbers, remember? And maybe even losing precision.

By now, Kasperle, who was the recursion guy, lost his patience with my stupid challenge. Imri is trying to calculate crazy numbers for log approximations, which I stopped following long ago. :)

FYI: Kasperle’s code, which is pretty cool, goes like this:

foo = lambda n,base: (n or “”) and (str(foo( n / base, base)) + str( n % base))

Notice the way the recursion stops…However, in one-liner code, I prefer assigning the result to a value, rather than assign the lambda and call it. But it’s also possible to do, for instance: x = (lambda y: y+1)(0). But if you ask me, I don’t really like this notation.

Then Imri suggested another idea using sqrt, but I objected since we need math. The truth is that you can do x**0.5 in Python. But eventually his solution wasn’t good enough.

ARRRG, As for now I am giving up :( . If you have another idea, let me know.

Lambdas Forever

Saturday, October 20th, 2007

Ahhh Python, what a splendid scripting language. One of the most likeable features is the anonymous functions, aka Lambda. The lambda is actually a way to write/implement a simple one liner function in-place. The official docs says:

“Lambda forms (lambda expressions) have the same syntactic position as expressions. They are a shorthand to create anonymous functions; the expression lambda arguments: expression yields a function object.”

Instead of implemented the damned comparison function for sorting, you probably all know what I’m talking about:
def ComparisonProc(x, y):
    return  y – x
list.sort(ComparisonProc)

We can simply do:
list.sort(lambda x, y: y – x) # Descending
and voila.

This is a very simple example. Lambdas are really handy when you want to do one liner devices. Some of them which you manage to stuff in one line and some which you just can’t. However, without lambda it wouldn’t have been possible in the first place.

There are many samples in the Internet. I came up with something, hopefully even useful. Let’s say you want to print all .JPG files on your c:\, including subdirectories. So we have to scan the h.d for all files, then filter those with .JPG extension and afterwards print the result. :) Yes this is all possible in one-liner, let’s see the original code first:

for root, dirs, files in os.walk(‘c:’):
    for i in files:
            if i[-4:] == “.jpg”:
                    print i

The one-liner version:

print filter(lambda name: name[-4:] == ".jpg", reduce(lambda x,y:x+y, [i[2] for i in os.walk('c:')]))

See? Easy :)

Actually now, I have to explain a few more things.
1) We are only interested in the Files list from os.walk, therefore we take the third entry in the result, that’s i[2].

2) The i[2] itself, is a list, and we cannot filter a list of lists with the file names, therefore we have to flatten the lists to a single list containing the file names. This is where the reduce comes in, it will return the accumulated result of all lambdas - each time calling the lambda with the accumulated x and supplying the next item, y. Thus, adding the lists extends the resulting list and flatten them…

3) Now that we the a single list with all file names in the system, we need to filter out the files which are not .JPG. So yet again we use a lambda that checks the last 4 characters in the file name and assures whether it is a .JPG, all other files will be removed from the resulting list.

4) Lastly, print the result. Actually you can use pretty print (module pprint) to print it prettier :)

 Yes, Python’s crazy!

So what’s the annoying things with lambdas? They are slow relatively to list comprehensions (which we used to get all lists of file names above). But again, if we are using scripting – are we after speed? I am not sure. Another irritating thing about lambdas is that you cannot assign inside the expression, but then you have reduce.. :)

The worst thing about lambdas is when you use global variables, and let me explain. Since lambdas are evaluated at runtime (I hope I am right here) if you access some variables outside of the lambda, they will get re-evaluated everytime with the lambda itself. Now think that you wanted the lambda to have a specific value when you created the lambda, and then when you really call the lambda, that value was already changed and your result is screwed.

Enough words, let’s see the problem with some code:

>>> x, y = [lambda z: 5+z+i  for i in xrange(2)]
>>> x(0)
6
>>> y(0)
6

Oh uh, we’re in trouble!

Do you notice we get the same result for both functions? This is incorrect because they are not supposed to return the same value. Note this:

>>> i
1

So now when both lambdas, x and y, are evaluated they use i as 1. Sucks huh? 

>>> i = 3
>>> x(0)
8
>>> y(0)
8
“Use Nested Lambdas, Luke”

x, y = [(lambda v: lambda z: 5 + v)(i) for i in xrange(2)]
>>> x, y = [(lambda v: lambda z: 5 + v)(i) for i in xrange(2)]
>>> x(0)
5
>>> y(0)
6
>>>

The outter lambda is get evaluated immediately and thus leaves the value, and not the pointer to the value, in the code. Next time when the inner lambda is evaluated it uses the value-of(i) and not the value-from-pointer-of(i).

This surely will help someone out there :) And then they say Lambdas are going to be deprecated in Python 3000…

[Updated]

Thanks to Kasperle, here’s another solution to the nested lambdas:

x, y = [lambda z, dummy = i: 5 + dummy for i in xrange(2)]

The drawback is that x and y can now get a second parameter which you potentially let the caller override…Or if we’re talking about Python 2.5 you can use functools.partial.

Plugins’ Headache

Sunday, June 10th, 2007

As I was writing a plugin, or to be more accurate, an extension module for Python in C/Win32; I had mysterious crashings here and there. Using a JIT I saw that the crash is in malloc/free so I looked at the stack and saw where’s the problem at (function scope). So far so good, piece of cake. Isolating the problematic code, yet I still didn’t fully understand why it happens, I ran it on a new clean project in Debug mode and got the cool message that there might be a heap corruption. Not surprising, of course. Now, the thing that made me really angry was that my extension used to work with an older version of MSVS. So thinking of this fact, I immediately took a look at python25.dll and saw that it uses MSVCRT version 7. Also, looking at my recently compiled extension dll, I noticed that my CRT is version 8. Knowing all this: heap does problems and different CRTs, I understood that my code uses one heap and Python’s uses another heap. Which is a big no-no, of course, mixing between different heaps…ouch.

Well, it was time to find a solution to a simple problem – I have to use the same heap. But how? Well, to make a short story shorter, I took a look at the Python’s header files and noticed they have two malloc’s. One was a macro which substitutes with malloc(size), which means use the current compiler’s CRT (that what I had used actually). And the other malloc was their own implementation of memory-allocation (above their own CRT’s malloc). Reading the comments surrounding that function declaration it says something like “platform independent allocations”. So voila, I knew that all I had to do is using these memory functions. At the moment I saw these functions, I blessed Python’s developers for thinking, whether directly or not, about this problem :)

The moral story is that if you integrate your code with another application in the code level make sure you talk the same protocol. It might be a memory heap, in my case, and it might be different implementation to FILE*… Just remember to supply the necessary functions for different compilers. Or if you’re on the SDK users side, use the SDK wisely, before you start hacking your way to a working solution.