Archive for the ‘Win32’ Category

Don’t Wait, Shoot. (KeSetEvent)

Tuesday, November 3rd, 2009

Apparently when you call down a driver with IoCallDriver you can either wait ’till the operation is finished or not. If you wait, you will need somebody to tell you “hey dude, you can stop waiting”. But that’s trivial, you set up a completion routine that will be called once the lower driver is finished with your IRP. The problem is if in some cases you don’t check the return code from that driver, and you assume you should always wait. So you Wait. Now what? Now the lower driver suddenly returns immediately, but did you know that? Probably not, cause you’re not blocked forever, otherwise you would have noticed it immediately. However, there’s seemingly no problem, because the lower driver will call your completion routine anyway indirectly and there you will signal “hey dude, you can stop waiting”, right? Therefore, it turns out you waited for nothing and just consumed some resources (locks).
That’s why usually you will see a simple test to see if the return code from the IoCallDriver is STATUS_PENDING, and only then you will wait ’till the operation is finished, in order to make it synchronized, that’s all the talk about. The thing is that you still need to do that same check in the completion routine you supplied. It seems that if you simply call SetEvent (and remebmer, now we know nobody is waiting on the event anymore, so why signaling it from the beginning), you still cause some performance penalty. And when you’re in a filter driver, for instance, you shouldn’t. And it’s a bad practice programming anyway.

I think it’s quite clear why WaitForSingleObject is “slow”, though in our case it will be immediately satisfied and yet… But I didn’t realize SetEvent is also problematic at a first thought. I thought it was a matter of flagging a boolean. In some sense, it’s true, there’s more to it. You see, since somebody might be waiting on the event, you will have to wake up the waiting thread and for that you need to lock the dispatcher-lock, and to yield execution, etc. Now suddenly it becomes a pain, huh?

Actually it’s quite interesting the way the KeSetEvent works. They knew they had to satisfy waiters, so they acquire the dispatcher-lock in the first place, and then they can also safely touch the event-state.

Moral of the story, don’t wait if you don’t have to, just shoot!

Cleaning Resources Automatically

Tuesday, September 15th, 2009

HelllllloW everybody!!!

Finally I am back, after 7 months, from a crazy trip in South America! Where I got robbed at gun-point, and denied access to USA (wanted to consult there), saw amazing views and creatures, among other stories, but it’s not the place to talk about them :) Maybe if you insist I will post once about it.

<geek warning>
Honestly, I freaked out in the trip without a computer (bits/assembly/compiler),  so all I could do was reading and following many blogs and stuff.
</geek warning>

I even learned that my post about the kernel DoS in XPSP3 about the desktop wallpaper weakness became a CVE. It seems MS has fixed it already, yey.

And now I need to warm up a bit, and I decided to dive into C++ with an interesting and very useful example. Cleaning resources automatically when they go out of scope. I think that not many people are aware to the simplicity of writing such a feature in C++ (or any other language that supports templates).

The whole issue is about how you destroy resources, I will use Win32 resources for the sake of example. I already talked once about this issue in C.

Suppose we have the following snippet:

HBITMAP h = LoadBitmap(...);
if (h == NULL) return 0;
HBITMAP h2 = Loadbitmap(...);
if (h2 ==  NULL) {
   return 0;
char* p = (char*)malloc(1000);
if (p == NULL) {
   return 0;

And so on, every failure handling in the if statements we have to clean more and more resources. And you know what, even today people forget to clean all resources, and it might even lead to security problems. But that was C times, and now we are all cool and know C++, so why not use it? One book which talks about these issues is Effective C++, recommended.

Also, another problem with the above code is while an exception is being thrown in the middle or afterward, you still have to clean those resources and copy/paste some lines, which makes it prone to errors.

Basically, all we need is a nice small class which will be called AutoResource that holds the object itself and will manage it. Personally, it reminds me auto_ptr class, but it’s way less permissive. You will be only able to initialize and use the object. Of course, it will be destroyed automatically when it goes out of scope.
How about this code now:

AutoResource<HBITMAP> h(LoadBitmap(...));
AutoResource<HBITMAP> h2(LoadBitmap(...));
char* p = new char[1000]; // If an exception is thrown, all objects initialized prior to this line are automatically cleaned.

Now you can move on and be totally free of ugly failure testing code and not worry about leaking objects, etc. Let’s get into details. What we really need is a special class that we can change the behavior of the CleanUp()  method according to its object’s type, that’s easily possible in C++ by using a method specialization technique. We will not let to copy or assign to the class. We want a total control over the object, therefore we will let the user to get() it too. And as a type of defense programming, we will force the coder to implement a specialized CleanUp() in a case he uses the class for new types and forgets to implement the new CleanUp code; By using compile time assertion (I used this trick from Boost). Also, there might be a case where the constructor input is NULL, and therefore the constructor will have to inform the caller by throwing an exception, download and check out the complete code later.

template <class T> class AutoResource {
   AutoResource(T t) : m_obj(t) { }

   void CleanUp()
      // WARNING:
      // If the assertion occurred you will have to specialize the CleanUp with the new type.

      m_obj = NULL;
   T get() const
      return m_obj;
   T m_obj;

//Here we specialize the CleanUp() for the HICON resource.
template<> void AutoResource<HICON>::CleanUp()

You can easily add new types and enjoy the class.  If you have any suggestions/fixes please leave a comment!

Download complete code: AutoResource.cpp.

Thanks to Yan Michalevsky for helping with the code!

P.S – While compiling this stuff, I found a crash in the Optimization Compiler of VS2008 :)  lol.

VML + ANI ZERT Patches

Tuesday, February 3rd, 2009

It is time to release an old presentation about the VML and ANI vulnerabilities that were patched by ZERT. It explains the vulnerabilities and how they were closed. It is somewhat very technical, Assembly is required if you wanna really enjoy it. I also gave a talk using this presentation in CCC 2007. It so happened that I wrote the patches, with the extensive help of the team, of course.

ZERT Patches.ppt

Oh No, My XPSP3

Monday, February 2nd, 2009
#include <windows.h>
int main()
 WCHAR c[1000] = {0};
 memset(c, 'c', 1000);
 SystemParametersInfo(SPI_SETDESKWALLPAPER, 0, (PVOID)c, 0);

 WCHAR b[1000] = {0};
 SystemParametersInfo(SPI_GETDESKWALLPAPER, 1000, (PVOID)b, 0);
 return 0;

Two posts ago I talked about vulnerabilities. So here’s some Zero Day. This will crash your system, unless you’re on Vista (which is already immune to it). And why the heck on SP3 we are still having this thing not closed yet?

It might be exploitable, I didn’t research it any further than the BSOD of the security cookie…Maybe on some compilations without /GS it can be easily exploited. Or maybe overriding enough of the stack to trigger an exception could be it.

“Remember to let her into your heart,
Then you can start to make it better” – The Beatles.

Import Symbols by Ordinals in Link-Time (using VS)

Thursday, August 7th, 2008

I bet some of you have always wanted to use a system dll file which has a few exported functions, though by ordinal numbers only. Yesterday it happenned to me. I had a dll which exported many a function by ordinal. For instance, you can take a look yourself at ws2_32.dll to learn quickly using dependency walker or another tool the exports of this file. Sometimes you have a 3rd party dll file which you lack both the header file (.h) and a .lib file. And let’s say that you know which functions (their ordinals) you want to use and their prototypes (which is a must). There are several approaches to use the dll in that case.

1) Use LoadLibrary and GetProcAddress programmatically to control the load and pointers retrieval for the functions you’re interested in. This is like the number one guaranteed way to work. I don’t like this idea, because sometimes you are bound to use the library as it is already linked to the file at link-time. Another disadvantage I find important is that you have to change your code in order to support that 3rd party dll functinoality. And for me this is out of question. The code should stay as it is nevertheless the way I use outsider dlls. And besides, why to bother add extra code for imported functions when the linker should do it for you?

2) Use delay-load dll’s, but you will still have to use some helper functions in order to import a function by its ordinal. If you ask me, this is just lame, and probably you don’t need a lazy load anyway.

3) Patching your PE file manually to use ordinals instead of function names. But there’s a catch here, how can you compile your file, if you can’t resolve the symbols? Well, that depends on the support your 3rd party dll has. For example, you might have a .lib file for one version of the 3rd party dll and then everything is splendid. But with a newer version of that dll, all symbols are exported by ordinals. So you can still compile your code using the older .lib, but then you will have to patch the PE. Well, this is the way I did it the first time yesterday, only to witness that everything works well. But even if you change your code slightly and recompile it you will have to fix the PE again and again. This is out of question. And besides you don’t solve a solution with atomic bomb when you can use a small missle, even if you have the skills to do that, something it’s possibly wrong and you should look for a different way, I believe this is true for most things in life anyway.

4) Using IMPORTS statement in your .def file. Which there you tell the linker how to resolve the symbols and everything. Well, this sounds like the ultimate solution, but unforetunately, you will get a linker warning: “warning LNK4017: IMPORTS statement not supported for the target platform; ignored”. Now you ask yourself what target platform are you on? Of course, x86, although I bet the feature is just not supported, so don’t go that way. The thing is that even Watcom C supports this statement, which is like the best thing ever. It seems that Visual C has supported it for sometime, but no more. And you know what’s worse? Delphi: procedure ImportByOrdinal; external MyLib index Ordinal; When I saw that line I freaked out that VS doesn’t have support for such a thing as well, so cruel you say. ;)

5) And finally the real way to do it. Eventually I couldn’t manage to avoid the compiler tools of VS anymore and I decided to give them a try. Which I gotta say was a pleasant move, because the tools are all easy to use and in 10 mins I finished hacking the best solution out there, apparently…

It all comes down to the way you export and import functions. We all know how to export functions, we can use both dllspec(dllexport) or use a .def file which within we state which symbols to export, as simple as that. Of course, you could have used dllspec(dllimport) but that is only when you got a symbolic name for your import.

So what you have to do is this, write a new .def file which will export (yes, export) all the functions you want to import! but this time it’s a bit more tricky since you gotta handle specially the symbolic names of the imports, I will get to that soon. An example of a .def file that will import from ‘ws2_32.dll’ is this:

LIBRARY ws2_32.dll


    MyAnonymousWSFunction @ 555

That’s it. We have a function name to use in our code to access some anonymous function which is exported by the ordinal 555. Well but that ain’t enough, you still need to do one more thing with this newly created .def file and that is to create a .lib file out of it using a tool named  ‘lib’: lib /DEF:my.def /OUT:my.lib

Now you got a new .lib file which describes the imports you want to use in your original PE file. You go back to that code and add this new ‘my.lib’ file in the linker options and you’re done.

Just as a tip, you can use ‘dumpbin /exports ida.lib’ in order to verify your required symbols.

We can now understand that in order to import symbols, VS requires a .lib file (and in order to export you need a .def file). The nice thing about it is that we can create one on our own. However, as much as we wish to use a .def file to import symbols too, it’s impossible.

One or two more things you should know. The linker might shout at you that a symbol __imp__blabla is not found. That does NOT mean you have to define your symbol with a prefix of __imp__. Let’s stick to demangled names, since I haven’t tried it on C++ names and it’s easier sticking to a simple case.

For cdecl calling convention you add a prefix of ‘_’ to your symbol name. For stdcall calling convention you don’t add any prefix nor suffix. But again, for stdcall you will have to instruct the linker how many bytes on stack the function requires (according to the prototype), so it’s as simple as the number of params multiplied by 4. Thus if you got a prototype of ‘void bla(int a, int* b);’ You will define your symbol as ‘bla@8 @ 555′. Note that the first number after the @ is the number of bytes, and the second one is the ordinal number, which is the most important thing around here.

And one last thing you should know, if you want to import data (no cdecl nor stdcall that is) you have to end the symbol’s line with ‘DATA’. Just like this: ‘MyFunc @ 666 DATA’.

I am positive this one gonna save some of you guys, good luck.

Context of Execution (Meet Pyda)

Tuesday, August 5th, 2008

While I was writing an interactive console Python plugin for IDA pro I stumbled in a very buggy problem. The way a plugin works in IDA is that you supply 3 callbacks, init, run, term. You can guess which is called when. I just might put in that there are a few types of plugins and each type can be loaded in different times. I used, let’s say, a UI plugin which gets to run when you hit a hotkey. At that moment my run() function executes in the context of IDA’s thread. And only then I’m allowed to use all other data/code from IDA’s SDK. This is all good so far, and as it should be from such a plugin interface. So what’s wrong? The key idea behind my plugin is that it is an interactive Python console. Means that you have a console window which there you enter commands which eventually will be executed by the Python’s interpreter. Indirectly, this means that I create a thread with a tight loop that waits for user’s input (it doesn’t really matter how it’s being done at the moment). Once the user supplies some input and presses the enter key I get the buffer and run it. Now, it should be clear that the code which Python runs is in that same thread of the console, which I created. Can you see the problem? Ok, maybe not yet.

Some of the commands you can run are wrappers for IDA commands, just like the somewhat embedded IDC scripting in IDA, you have all those functions, but this time in Python. Suppose that you try to access an instruction to get its mnemonic from Python, but this time you do it from a different thread while the unspoken contract with IDA plugin is that whatever you do, you do it in run() function time, and unforetunately this is not the case, and cannot be the case ever, since it is an interactive console that waits for your input and should be ready anytime. If you just try to run SDK commands from the other (console) thread then you experience mysterious crashes and unexplained weird bugs, which is a big no no, and cannot be forgiveable for a product like this, of course.

Now that we know the problem we need to hack a solution. I could have done a very nasty hack, that each time the user runs something in Python, I simulate the hotkey for the plugin and then my plugin’s run() will be called from IDA’s context and there I will look at some global info which will instruct me what to run, and later on I need to marshal back the result to the Python’s (console) thread somehow. This is all possible, but com’on this is fugly.

I read the SDK upside down to find some way to get another callback, which I can somehow trigger whenever I want, in order to run from IDA’s context, but hardly I found something good and at the end nothing worked properly and it was too shakey as well.

Just to note that if you call a function from the created thread, most of the times it works, but when you have some algorithm running heavily, you get crashes after a short while. Even something simple like reading some info of an instruction in a busy loop and at the same time scrolling the view might cause problems.

Then I decided it’s time to do it correctly without any hacks at all. So what I did was reading in MSDN and looking for ways to run my own code at another thread’s context. Alternatively, maybe I could block IDA’s thread while running the functions from my own thread, but it doesn’t sound too clever, and there’s no really easy way to block a thread. And mind you that I wanted a simple and clean solution as much as possible, without crafting some Assembly code, etc. ;)

So one of the things that jumped into my mind was APC, using the function QueueUserAPC, which gets among other params an HTHREAD, which is exactly what I needed. However, the APC will get to run on the target thread only when it is in an alertable state. Which is really bad, because most UI threads are never in that state. So off I went to find another idea. Though this time I felt I got closer to the solution I wanted.

Later on a friend tipped me to check window messages. Apparently if you send a message to another window, which was created by the target thread (if the thread is a UI one and has windows, of course we assume that in IDA’s case and it’s totally ok), the target thread will dispatch that window’s window-procedure in its own context. But this is nothing new for me or you here, I guess. The thing was that I could create a window as a child of IDA’s main window and supply my own window-procedure to that window. And whenever a message is received at this window, its window-procedure gets to run in IDA’s thread’s context! (and rather not the one which created it). BAM mission accomplished. [Oh and by the way, timers (SetTimer) work the same way, since they are implemented as messages after all.]

After implementing this mechanism for getting to run in original’s IDA’s context, just like the contract wants, everything runs perfectly. To be honest, it could easily have not worked because it’s really dependent on the way IDA works and its design, but oh well. :)

About this tool, I am going to release it (as a source code) in the upcoming month. It is called Pyda, as a cool abbreviation for Python and IDA. More information will be available when it is out.

TinyPE NG is Out

Thursday, January 17th, 2008

Here you go guys:

Source will be released withing a couple of weeks.
Have fun :)
Meanwhile I will be in Turkey for the weekend to relax and leave the bits behind.


Delegators #3 – The ATL Way

Saturday, November 24th, 2007

The Active-Template Library (or ATL) is very useful. I think that if you code in C++ under Windows it’s even a must. It will solve your great many hours of work. Although I have personally found a few bugs in this library, the code is very tight and does the work well. In this post I’m going to focus on the CWindow class, although there are many other classes which do the delegations seamlessly for the user, such as: CAxDialogImpl, CAxWindow, etc. CWindow is the main one and so we will examine it.

I said in an earlier post that ATL uses thunks to call the instance’s window-procedure. A thunk is a mechanism to convert a function invocation between callee and caller. Look it up in Wiki for more info… To be honest, I was somewhat surprised to see that the mighty ATL uses Assembly to implement the thunks. As I was suggesting a few ways myself to avoid Assembly, I don’t see a really good reason to use Assembly here. You can say that the the ways I suggested are less safe, but if a window is choosing to be malicious you can make everything screwed anyway, so I don’t think it’s for this reason they used Assembly. Another reason I can think of is because their way, they don’t have to look-up for the instance’s ‘this’ pointer, they just have it, wheareas you will have to call GetProp or GetWindowLong. But come on… so if you got any idea let me know. I seriously have no problem with Assembly, but as most people thought that delegations must be implemented in Assembly, I showed you that’s not true. The reason it’s really surprised me is that the Assembly code is not portable among processors as you know; and ATL is very popular and used library. So if you take a look at ATL’s thunks code, you will see that they support x86 (obviously), AMD64, MIPS, ARM and more. And I ask, why the heck?when you can avoid it all? Again, for speed? Not sure it’s really worth it. The ATL guys know what they do, I doubt they didn’t know they could have done it without Assembly.

Anyhow, let’s get dirty, it’s all about their _stdcallthunk struct in the file atlstdthunk.h. The struct has a few members, that their layout in memory will be the same as it was in the file, that’s the key here. There is an Init function which constructs the members. These members are the byte code of the thunk itself, that’s why their layout in memory is important, because they are get to ran later by the processor. The Init function gets the ‘this’ pointer and the window procedure pointer. And then it will initialize the members to form the following code:

mov dword ptr [esp+4], this
jmp WndProc

Note that ‘this’ and ‘WndProc’ are member values that their values will be determined in construction-time. They must be known in advance, prior to creation of the thunk. Seeing [esp+4] we know they override the first argument of the instance’s window-procedure which is hWnd. They could have pushed another argument for the ‘this’ pointer, but why should they do it if they can recover the hWnd from the ‘this’ pointer anyway…? And save a stack-access? :)

Since the jmp instruction is relative in its behaviour, that is, the offset to the target address is not absolute but rather relative to the address of the jmp instruction itself, upon initialization the offset is calculated as well, like this:

DWORD((INT_PTR)proc – ((INT_PTR)this+sizeof(_stdcallthunk)));

Note that the ‘this’ here is the address of the thunk itself in memory (already allocated).
Now that we know how the thunk really looks and what it does, let’s see how it’s all get connected, from the global window procedure to the instance’s one. This is some pseudo code I came up with (and does not really reflect the ATL code, I only wanna give you the idea of it):


WndProcThunk = AllocThunk((DWORD)this.WndProc, (DWORD)this);
m_hWnd = CreateWindow (…);
SetWindowLong(m_hWnd, GWL_WNDPROC, WndProcThunk);

This is not all yet, although in reality it’s a bit more complicated in the way they bind the HWND with its thunk…Now when the window will be sent a message, the stack will contain the HWND, MESSAGE, WPARAM and LPARAM arguments for the original window procedure, but then the thunk will change the HWND to THIS and immediately transfer control to the global window procedure, but this time they got the context of the instance!

CWindow::WindowProc(HWND hWnd, UINT message, WPARAM wParam, LPARAM lParam)
 CWindowImplBaseT< TBase, TWinTraits >* pThis = (CWindowImplBaseT< TBase, TWinTraits >*)hWnd;
 pThis->ProcessWindowMessage(pThis->m_hWnd, uMsg, wParam, lParam, lRes, 0);

And there we go. Notice the cast from hWnd to the ‘this’ pointer and the call with the context of the instance, at last, mission accomplished ;)

More information can be found here.

 A new problem now arises, I will mention it in the next post in this chain, but in a couple of words, here’s a hint: NX bit.

Delegators #2 – C++ & Win32

Saturday, November 17th, 2007

Last post I was talking about using SetProp/GetProp to link between an HWND and an instance to a class that encapsulates the window widget. There’s another way to do it.

The first time you are taught how to create a window in Win32, they tell you that you have to fill in all fields in the WNDCLASS structure, but they always keep on saying to ignore some of the advanced fields. One of the fields inside that structure is cbWndExtra, this one says to the windows manager how many bytes to allocate after the original windows manager’s structure. So suppose all we need is to save the ‘this’ pointer, it means we will need 4 bytes, in 32bits environment system, of course. So we can set cbWndExtra to 4, and then we have a DWORD we can access to as much as we like with the SetWindowLong/GetWindowLong API.

Something like this would suffice:

wc. fields = …
wc.lpszClassName = “myclass”;
wc.cbWndExtra = 4;
HWND hWnd = CreateWindow(“myclass”, …)
SetWindowLong(hWnd, 0, this);


And later on in the window-procedure, you can do this:

LRESULT WINAPI MyWindow::WndProc(HWND hWnd, UINT message, WPARAM wparam, LPARAM lparam)
 MyWindow* pThis = (MyWindow*)GetWindowLong(hWnd, 0);
 if (pThis != 0) return pThis->WndProc(message, wparam, lparam);
 return DefWindowProc(hWnd, message, wparam, lparam);

 And there you go. There are a few problems with this technique though. You can’t handle the WM_CREATE message inside the instance’s WndProc, that’s because before CreateWindow returns, the global WndProc will get called with this message, and you didn’t have the chance to set the ‘this’ pointer…

The second problem is that this technique is good only for windows you create yourself, because you control the WNDCLASS… You can use pre-created window-classes of the system by changing their fields in your application only, I think this will create a copy of the pre-created classes only for your process.

Anyhow, to solve the first problem, I saw implementation that create the instance of the object when they receive the WM_CREATE inside the global WndProc. This is really a matter of design, but this way you can call an initializer of the instance once WM_CREATE was received. I can’t come up with another way at the moment if the instance is already created. Maybe just call the initializer on the first message the window gets? Well it’s a bit cracky.

Delegators #1 – C++ & Win32

Wednesday, November 14th, 2007

In the technical point of view a Delagator is a call-forwarder, as simple as that. But in the designing aspect it is a technique where an object outwardly expresses certain behaviour but in reality delegates responsibility for implementing that behavior… And thanks for Wiki for sparing me the ugly explanation. :)

The reason I came up to want to implement delegators as seamlessly as possible in C++ was because I got to a situation where I wanted to write a pure OOP wrapper for windows objects (CreateWindow). Now let me get into the juicy details before you raise your eyebrows. And even before that – yes, something like ATL but only for windows. So say you have a class that is responsible for creating a window and controling it by handling its messages and stuff. Sounds legitimate, right? It would look something like:

class Window {
Window(const cstring& name) { m_hWnd = CreateWindow(…) }
void Show () { ShowWindow(m_hWnd, SW_SHOW); }
 HWND m_hWnd;

And you get the notion. It looks ok and it is right, but let’s get further and hit the problem. When we create a window we need to pass the name of the window-class, that is some structure that contains more general info about the class itself, like the background color, mouse cursor and other stuff, but the most important one of them is wndproc pointer. The wndproc pointer is a pointer to a callback function that handles the messages of the windows that belong to this window-class. I assume you know how Win32 Windows system works. Now, can you spot the problem already?

Well, since it asks for a pointer to a function, and we want to have an instance of our Window object per window, there is no way to bind between the two. (OK, I lie, but continue reading please). In our case we would like to have the method of our message-handler being called and not a global function. If you’re not sure why, then think that each window has some private members in its instance that tells special things about that window. So we gotta find a way to link between our instance and our “window-procedure” method.

This is a start:

class Window
LRESULT WINAPI WndProc(HWND hWnd, UINT message, WPARAM wparam, LPARAM lparam)
 switch (message)
 return DefWindowProc(hWnd, message, wparam, lparam);
static LRESULT WINAPI WndProc(HWND hWnd, UINT message, WPARAM wparam, LPARAM lparam)
 // Delegate to the instance’s window procedure.
 return g_window.WndProc(hWnd, message, wparam, lparam);

So now the window-class structure will point to the static function WndProc which when called will delegate its parameters to the internal (private) WndProc that can access the members. It’s almost a good solution. But now we are allowed to have only one window and a global instance that contains it. The good thing is that a static public function can call a private method of the same class, so we can hide the core of it. The bad thing is we still expose an internal function we don’t want to in the first place.

The problem is now to find a way to link between a real window object and our instance. The ID of a window is its HWND (or Handle to Window). So we could hold a list with all HWND’s we created and look it up before delegation and then make the right call to the correct instance. This is too much hassle for nothing. There ought to be a way to store some private data on the window object itself, right? At least I would suspect so. Eventually, after reading some MSDN and searching the net, I found a savior function which is called SetProp (and GetProp ofc). Exmaining their prototypes:
BOOL SetProp(HWND hWnd, LPCTSTR lpString, HANDLE hData);
HANDLE GetProp(HWND hWnd, LPCTSTR lpString);
We actually have a kind of dictionary, we give a string and store a pointer (to anything we’re upto). Afterwards, we can retrieve that pointer by using GetProp. Let’s work it out again:

m_hWnd = CreateWindow(…);
SetProp(m_hWnd, “THIS_POINTER”, (HANDLE)this); // Ah ha!

What we did was to link the HWND with the this pointer. The window procedure will look like this now:

static LRESULT WINAPI WndProc(HWND hWnd, UINT message, WPARAM wparam, LPARAM lparam)
 Window* pThis = (Window*)GetProp(m_hWnd, “THIS_POINTER”); // Some magic?
 if (pThis != NULL) return pThis->WndProc(…); // Forward the call to the correct instance.
 return DefWindowProc(…);

 Well, as for the code here, I don’t handle errors, and I use C casts, so sue me :). I merely wanna show you the mechanism. And if you really wanna get dirty, you will have to RemoveProp when you get WM_NCDESTROY, etc…

After I got this code really working, I still was wondering how ATL does this binding. So I took a look at their code… It seems to have a global linked list with all instances of the windows that ATL created. And then when the global window procedure is get called, it looks it up on that list. In reality it is much more complex then my explanation, since they need to synchronize accesses among threads, make sure the window belongs to the same thread, etc… All that for only the first time call of the window-procedure. Then it sets the REAL ‘window-procedure’ method of the instance itself, and there it uses Assembly, muwahaha. That will be covered next time.

BTW – SetWindowLong cannot work since all you can do is changing the window-class fields. Although, maybe there is some undocumented field you can play with to store anything you like. Never know? :)