Archive for December, 2007

Let Me Import

Wednesday, December 19th, 2007

The problem is that you cannot import modules from different (relative) paths in Python. Now sometimes it’s really a must. Like when you have a generic CPU module and then a subdirectory called ‘x86’ that wants to derive that CPU class and create a ‘x86CPU’.

So I tried to mess up with __import__ but without any luck. Then I said I just want to solve this problem, don’t care how far (ugly) I go. I started with execfile, which AFAIK isn’t exactly like import, but it’s good enough for me. execfile(“../cpu.py”) didn’t work, unforetunately. I then realized that the cpu.py file imports another file from its own directory and since the execfile doesn’t do some magic with the directories the import simply failed. Bammer. Next thing I did was to add the path of the CPU module to the sys.path and retried my attempt with a success. Although it works, I don’t like the directory separator which isn’t cross-platform. And yes I said it’s a temporary solution, but I’m still sensitive to it. Usually temporary hacks that work tend to stay there, as much as I don’t like this reproach…this is how things work usually, no?

So that one was solved, quite easily, but another one arised. In the original code (when all files used to be in the same directory) when import worked I always import the module name, and not the lot of it (import * from cpu – bad habit). So all my code is accessing the CPU module with ‘CPU.’ prefix, which is good, you know the source of everything you touch. The new problem is that since I moved to use execfile this prefix is ill now and I must get rid off it.

I thought about changing the globals dictionary to the name of the module I want to execfile() and then switch it back. But it becomes too nasty, and I’m not sure whether it would work anyway. My first attempt might be ok for some of you, alas, it’s still not good enough for me.

And I think my design with the files is ok, after all it makes sense. And yes, I know I (maybe?) should have put the files in site-packages directory and the import then would work well. However, I want the files to be in a specific directory out of Python reach (in a way).

Oh by the way, the code of my attempt is:

import os, sys
sys.path.append(os.getcwd()[:os.getcwd().rfind(os.sep)])
execfile(“..” + os.sep + “cpu.py”)

Ugly something, heh?

Ok I was just going to publish this post but I had to try another thing which really solved the issue. I managed to use __import__ well.

So now it looks like:

import os, sys
sys.path.append(os.getcwd()[:os.getcwd().rfind(os.sep)])
cpu = __import__(“cpu”)

This time it solves the second problem as well – the namespace (of the module)  issue. And yet we need the new path. Can we get rid off it?

A C Tidbit

Wednesday, December 12th, 2007

Yeah well, the title is a bit weird… So I’ve been reading the C standard for fun (get a girl?) and I was interested in two things actually. First of all, while I was reading it a friend asked me a question if the layout of local (auto) variables in the scope of a function have any special order. The short answer is no. And I won’t extend this one much. Although nothing is mentioned about the “stack” and how variables are supposed to be in memory in this case. Now this is a bit confusing because in structures (and unions) what you define is how they lie in memory. But this only sounds normal because if you need to read something from hardware in a way, you really care about the order of the members of the structure… Though as it happened to me while I was reversing I saw lots of variables-recycling. Means the compiler uses a variable (which is in the stack) and afterwards went it’s out of scope it re-uses the same memory place for another variable in the same function. Or it might be the coder sucked so much that he used the same variable a few times…:P which I doubt since for once the dword was used as a byte. So the only thing you know about the stack of the function is its layout according to that same function’s code of that specific version and only after compilation (assembly listing is good too).

The other thing I was curious about is the pointer arithmetic that is mixed with integer expressions:

char *p = 0;
p + 5

We all know that it will evaluate to 5. But what about:

long *p = 0;
p + 5 ?

This one will evalute to 20, 5 * sizeof(type)…

Actually the compiler translates the operator [], array subscript, such that p[i] = *(p + i). Nothing is special here as well, unless you didn’t mess with C for a long while… Now it becomes really powerful when you can cast from one pointer type to another and use the indirection operator, so for example, say you want to scan for a dword in a memory block:

long *p = <some input>;
for (int i = 0; i < 1000; i++)
   if (p[i] == value) found…

 But in this case, you can miss that dword since we didn’t scan in byte alignment, so we have two options, either changing the type of pointed p to be a char, or to make a cast… In both ways we need to fix the limit of scan, of course.

So now it becomes:

if (*(dword*) ( ((char*)p) +i) == value) found…

This way p is scanned in byte units, because i is multiplied by sizeof(char). And we take that pointer to char, which is incremented by 1 every iteration and cast it to a dword and then derference that… I think you cannot avoid the use of a cast (or maybe automatic conversion) in the matter of getting this task completed.

Now it might be obvious to most of you, but I doubt it, since I fell in this trap as well:

long a[4] = {0};
printf(“%d”, &a[1] – &a[0]);

What will this code print when executed? I thought 4, as the sizeof of ‘a’ element is a 4 (denoted from the ‘long’ type), but to my surprise ‘1’ was printed; as it will do it as if the subscripts were the operands, however the result will be signed integer. Thus 1 – 0.

The bottom line is that p[i] is *((char*)p + i*sizeof(p[0])) and p[i] – p[j] is ((char*)&p[i] – (char*)&p[j])/sizeof(p[0]). Well the latter is less useful, but if you thought the printf above will return the sizeof the element then you are wrong :)

Never Trust Your Input – A WordPress Case

Friday, December 7th, 2007

 Although this post isn’t about Security, the title still holds true about everything you do with input, but whatever you do, never trust it. :)

While I was manually editting the last post about SQL I encountered a bug in WordPress. This is not a big deal, probably because it’s not a security bug, but it’s really annoying when the browser freezes while you’re trying to edit your post. So I switched to FireFox and the same bug happened again. In the beginning I thought something was wrong in the internals of the browsers, but both browsers have the same internal bug? That doesn’t make much sense. Looking at the Task Manager I saw that both IE and FF chew up memory and CPU at 50% (this is because I have two cores, otherwise it would be 100% on a single processor…). So immediately I understood that there’s an infinite loop running, hence the 50% CPU usage and it allocates memory in a way (eventually by strings). Next thing I did was to isolate the code of WordPress somehow, I wasn’t even sure what caused this bug to surface in the beginning. All I knew was that it has something to do with my post. Therefore I needed to take a look at my raw post’s text, since it contained some HTML tags. So eventually I saw that I had an unbalanced PRE tag. Unbalanced means that I add an opener tag but didn’t close it. So left only a “<pre>” in my post and saw that the browsers freeze. Now what? Digging into WordPress code I understood there is a special class for a textarea input. I realized I have to search for a “pre” string in the JS files (which originated from .php files). And eventually after a few trials and errors of uncommenting code I found this chunk:

var startPos = -1;
while ((startPos = content.indexOf(‘<pre’, startPos+1)) != -1) {
    var endPos = content.indexOf(‘</pre>’, startPos+1);
    var innerPos = content.indexOf(‘>’, startPos+1);
    var chunkBefore = content.substring(0, innerPos);
    var chunkAfter = content.substring(endPos);
    var innards = content.substring(innerPos, endPos);

   innards = innards.replace(/\n/g, ‘<br />’);
   content = chunkBefore + innards + chunkAfter;
  }

This is ripped from the tiny_mce_gzip.php file. The ‘content’ variable holds the text of my post, which is “<pre>”. Now notice the first line in the block:

var endPos = content.indexOf(‘</pre>’, startPos+1);

Which renders endPos to be -1, since there’s no close tag. And more over, there is no check for the return value, the programmer assumed it will always find a match. :(
Now let’s analyze the block to see why it becomes an infinite loop that chews up memory:

endPos = -1, innerPos = 4, chunkBefore = “<pre”, chunkAfter = “<pre>”, innards = “<pre>”;

Notice that ‘innards’ contain the whole content, since evaluating a substring with an input of -1 returns the input untouched… We can ignore the replace, which doesn’t really affect the loop in any way here. And be hold, ‘content’ is being reassigned to hold the whole new string but now it looks like: “<pre<pre><pre>”.

And the loop stop condition of ‘startPos’ returns true and again endPos gets -1… And bam, you’re browser is frozen.

The fix is pretty straight forward. It’s a shame they have a stupid bug like this. What I had to do at my end was to use phpMyAdmin to edit the SQL tables and change the post so it won’t lock my browser yet again. Although I could have fixed this bug on the server, for some reason I didn’t do it…

Anyway I sent an email to WordPress and hopefully they will fix it immediately, though it’s nothing urgent.

SQL: Setting NOCOUNT Programmatically

Friday, December 7th, 2007

As I mess with SQL at work, I learnt about some feature which is called, NOCOUNT. This feature tells the server to return the number of affected rows after each query a client does, thus most of the times wasting bandwidth for unused/unwanted data. I was interested in setting this option server-wide. Although, some people will say it’s a bad move, because if you later have more DB’s on the server which need this feature, bla bla… But I know what I want and that’s what I will do.

The problem was how to set it programatically, I searched the inet for a bit, but didn’t come up with anything useful. So I decided to write that snippet on my own. It’s a bit tricky, but once you see it, it looks ok.

USE master;
DECLARE @CurrentValue AS INT;
CREATE TABLE #tmp (v VARCHAR(50), w INT, x INT, y INT, z INT);
INSERT INTO #tmp
      EXEC sp_configure 'User Options';
SELECT @CurrentValue = y FROM #tmp;
DROP TABLE #tmp;
SET @CurrentValue = @CurrentValue | 512; -- 512=NOCOUNT
EXEC sp_configure 'User Options', @CurrentValue;
RECONFIGURE WITH OVERRIDE;

The trick was how to get the results from the EXEC sp_configure so you can read them yourself, the solution was to create a temporary table which you instruct SQL to insert the results of sp_configure into that table. Later on, you can simply read from this table and do whatever you’re up to. This way we can preserve the original value and in addition set the NOCOUNT flag (512, as a bit field…).