I was telling the story to a friend of mine about me using POP ESP in some code I wrote, and then he noted how special it is to use such an instruction and probably I’m the first one whom he’s heard of that used it. So I decided to share. I’m sorry to be mystical about my recent posts, it’s just that they are connected to the place I work at, and I can’t talk really elaborate about everything.
Here we go.
I had to call a C++ function from my Assembly code and keep the return value untouched so the caller will get it. Usually return values are passed on EAX, in x86 that is. But that’s not the whole truth, they might be passed on EDX:EAX, if you want to return 64 bits integer, for instance.
My Assembly code was a wrapper to the C++ function, so once the C++ function returned, it got back to me, and so I couldn’t touch both EDX and EAX. The problem was that I had to clean the stack, as my wrapper function acted as STDCALL calling convention. Cleaning the stack is pretty easy, after you popped EBP and the stack pointer points to the return address, you still have to do POPs as the number of arguments your function receives. The calling convention also specifies which registers are to be preserved between calls, and which registers are scratch. Therefore I decided to use ECX for my part, because it’s a scratch register, and I didn’t want to dirty any other register. Note that by the time you need to return to the caller and both clean the arguments on the stack, it’s pretty hard to use push and pop instructions to back up a register so you can freely use it. Again, because you’re in the middle of cleaning the stack, so by the time you POP that register, the ESP moved already. Therefore I got stuck with ECX only, but that’s fine with me. After the C++ function returned to me, I read from some structure the number of arguments to clean. Suppose I had the pointer to that structure in my frame and it was easily accessible as a local variable. Then I cleaned my own stack frame, mov esp, ebp and pop ebp. Then ESP pointed the return address.
This is where it gets tricky:
Assume ECX holds the number of arguments to clean:
lea ecx, [esp + ecx*4 + 4]
That calculation gets the fixed stack address, like the ESP that a RET N instruction would get it to. So it needs to skip the number of arguments multiplied by 4, 4 bytes per argument, and add to that the return value itself.
Going on with:
xchg [esp], ecx
Which puts on the stack the fixed stack address, and getting ECX with the return address. This is where usually people get confused, take your time. I’m waiting
And then the almighty:
We actually popped the fixed stack pointer from the stack itself into the stack pointer. LOL
And since we got ECX loaded with the return address, we just have to branch to it.
What I was actually doing is to simulate the RET N instruction, using only ECX. And ESP should be used anyway. Now the function I was returning to, could access both the optional EDX and EAX as return values from the C++ function.
It seems that the solution begged a SMC (self modifying code) so I could just patch the N, in the RET N instruction, which is a 16 bits immediate value. But SMC is bad for performance, and obviously for multi threading…
Also note that I could just clean the stack, and then branched to something like: jmp [esp - argsCount*4 - 4],
but I don’t like reading off my stack pointer, that’s a bad practice (mostly from the days of interrupts…).
POP ESP FTW