The tale of one deadlock in SBCL on Windows

I keep discovering new bugs.

I recently finished the debugging session which took several weeks of my time. The SBCL process was hanging while loading cl-gtk2 from SLIME. But this hanging magically fixed itself if I would type anything into the *inferior-lisp* Emacs buffer, after which the loading would continue without any troubles. If I loaded cl-gtk2 from the console, all would go well.

At first, I was blaming it on some kind of a deadlock somewhere deep inside SBCL. I've tried various options for printing debug logs but it all did not help.

Normal native debuggers are unable to debug Lisp code in SBCL since SBCL uses interrupt instructions (int 3) to process interrupts, and debuggers just keep stopping all the time. Although recent SBCL versions gained the support for using UD2 instruction which is not supported to break inside the debugger. The only thing to help debug was a stack trace of all threads at the moment of a hang. Of course, native stack traces are useless for debugging Lisp code so I didn't even know in which of the high-level code did the hang happen.

By bisecting over cl-gtk2 code I could find the code that hung - which was a call to the gtk-init-check function. When the hang happened, the stack trace contained a call to GdiPlus.dll!__DllMainCRTStartup which had called kernel32.dll!_LoadLibraryExW. This suggested to me that the hang was due to the loading of a GdiPlus.dll dynamic library.

To check this hypothesis, I've run SBCL from the console, and the call (load-shared-object "GdiPlus.dll") completes without errors, but in REPL it hangs.

The hang itself happened inside the GetFileType function. MSDN documentation for it is quite harmless:

Retrieves the file type of the specified file.

What instead was usable is the comment inside MSDN that GetFileType can sometimes hang. Searching on the web for GetFileType hangs reveals this nugget:

For instance, GetFileType on a pipe hangs if there is a pending read request, which, BTW, causes the DLL load to hang sometimes since the C runtime startup in the DLL calls GetFileType on all known handles while setting up the file descriptor table for open/fopen.

Well, I'm used to documentation being quite spotty (especially when it comes to Microsoft documentation), but it's impossible to prepare for something like this in advance.

Fortunately, this bug is not caused by my code and that means that I'm ready to publish the SBCL fork with Windows threads for wider testing.