Windows allocates a special thread-specific structure — Thread Information Block (TIB) [1, 2, 3]. TIB contains thread-specific variables such as last error number, pointer to SEH (structured exception handling) frame, thread identifier, TLS backing store. This structure is documented in  and . Thread can access its own TIB via the
FS segment register:
%fs:0 is a first field in TIB.
First 64 TLS slots in are stored in TLS array in TIB at offset
0xE10 (slots with numbers higher than 63 are stored in some other memory location; pointer to this location is also stored in TIB). This location is documented and is used by several C/C++ compilers for thread-local variables [TODO: need a reference]. Windows provides
TlsAlloc function to allocate a slot in this array; we can allocate one slot from TLS array and use it to store a pointer to
struct thread. Controlling initial allocation of TLS slots would complicate the initialization of lisp runtime and the build process. And this will be a barrier for embedding SBCL.
TlsAlloc) until we get the slot we want. After that, we free the slots that we don't need. This way the pointer to
struct threadwill be at at a known memory location. It is safe to assume that slot 63 is free when SBCL is initialed: libraries on Windows commonly use no more that one TLS slot (and system libraries like
Winsockdon't even use TLS as they have their own fields in TIB). When
sbcl.exeis started, only several TLS slots are taken. Even if we fail to own slot 63, we will know this at the initialization time and we will not crash silently somewhere later.
TlsAlloc) and store slot index in a global variable. The initialization process is easy: just call
TlsAlloc. But all TLS accesses will have one more indirection. I consider that this does not present any kind of performance issue. SBCL's compiler uses macros like
(pseudo-atomic &body body),
(load-tl-symbol-value reg symbol). It would be necessary to allocate temporary register(s) for resolving the indirections.
The last option is clearly the best one; but it would require changes to code generation so I'll leave it to some later time.
Currently TLS is implemented as option 2. On initialization, we take the slot 63 (or fail if we couldn't - but I can't imagine the situation where this can happen). When windows-threads will be merged to SBCL, we should consider the option 3 because if implemented now, the change would be more than necessary for windows threading support.
ESsegment register and undocument Windows NT functions to allocate a segment and store it in Local Descriptor Table. This is problematic because WOW64 (the Windows subsystem used to run Win32 applications in Windows 64) does not preserve the value of
ESregister during context switches (including thread preemtion) and syscalls. This is the reason why 32-bit version of Clozure CL does not work on Windows 64.
windows-threads, pointer to
struct threadwas stored in TIB's
pvArbitraryfield. Windows lets libraries to freely use
pvArbitraryfield — thus there is a potential conflict which is hard to detect. Anton Kovalenko persuaded me that using known slot of TLS array is better.
Lisp code in SBCL runs in managed environment — SBCL needs to be able to safely suspend threads (because it uses stop-the-world garbage collector) and interrupt them (interruption is a process of stopping the thread and making it call some function).
On Unix-like systems, suspending and interrupting threads is conceptually simple. By sending UNIX signal to the thread we can suspend it or interrupt it. Despite conceptual simplicity, writing code that uses signals to synchrone threads is hard due to asynchronous code.
Windows, on the other hand, does not provide equivalent asynchronous interruptions. There are several ways of emulating them:
«Thread hijacking». Windows' debugging API has functions
SuspendThread (stops a thread),
SetThreadContext) (examine and modify the context of a thread) and
ResumeThread (resume a thread). We can push a stack frame on a stack and point thread context's
EIP register to the signal handler, imitating the delivery of a signal. This method is called «thread hijacking» or «EIP hijacking».
The bad news is that thread hijacking will lead to race conditions inside Windows internals  and so it should be avoided.
Asynchronous Procedure Calls. An APC in Windows acts like a signal in Unix — it executes some function in a context of thread and then returns to whatever thread was doing.
There are two kinds of APCs — «user APC» and «kernel APC». They differ in that kernel APCs are only available from kernel drivers and are delivered immediatelly while user APCs are delivered only to threads in an «alertable» state. Thread is in alertable state only for duration of some specific syscalls like
WaitForSingleObjectEx and some others. Only kernel code can issue a kernel APC but there is a QueueUserAPCEx project that lets applications use kernel APCs by installing a custom kernel driver.
Since QueueUserAPCEx requires installing a kernel-mode driver, I haven't tried it. By the way, pthreads_win32 project uses QueueUserAPCEx for implementation of
pthread_cancel if it's available.
Among these methods of thread suspension, only polling really works. So it's clear that we have no other choice but to implement polling in SBCL threads. Luckily, I didn't even need to do this myself — Paul Khuong implemented GC safepoints for SBCL[8, 9]. The safepoints are implemented by reading a known fixed memory location:
test %eax, GC_POLL_PAGE_ADDR where
GC_POLL_PAGE_ADDR is some fixed address. Under normal circumstances, this instruction will not have side effects on the running code (it only modifies
EFLAGS register and it is not inserted inside condition-checking code). But if the memory page is read-protected or unmapped, this instruction will cause a page fault which will be handled by an exception handler. Exception handler will put the thread to sleep or call the interruption function.
To signal all threads that GC is coming and they should pause,
GC_POLL_PAGE is read-protected. After that, threads will start receiving page faults (unless they are blocked in foreign code). We must consider that:
GC_POLL_PAGEunmapping is not immediate — it might take some time for a thread to reach the safepoint
WITHOUT-GCINGsection or doing some operation that temporarily breaks invariants. The thread must have a chance to finish what it is doing; but if such thread is resumed it would retry the safepoint instruction forever.
From these considerations we can draw some conclusions:
Thread suspension proceeds in two phases. In the first phase, threads are notified that they should pause. In the second phase threads are actually stopped. This separation is needed because we must know when to return back the GC poll page.
The thread state
STATE_SUSPENDED_BRIEFLY means that the thread has reached a safepoint and is waiting for the second phase of suspension process. Thread enters this state even if it's some state that prevents garbage collection.
STATE_SUSPENDED (except for threads that are ready for GC)
Thread is ready for GC if:
Thread marks itself as being
gc-safe when it is running non-lisp code and blockable signals are unblocked and it is not inside
To track whether a thread that cannot reach a safepoint allows the garbage collection, special variable
*GC-SAFE* is introduced. It is guaranteed that thread can not 'slip through' when garbage collection is commencing.
Thread interruption is similar, but we only need to wait for the thread that we are interested in.
pthread_cond_wait, a spurious wakeup is generated. This e.g., wakes the thread from
futex_wait. In future versions, other wakeups should be done: e.g., sleep is canceled, I/O is cancelled.
«Safepoint» is a code that performs some bookkeeping activities for a thread. It is called:
Safepoint code is responsible for:
STATE_SUSPENDED_BRIEFLYand waiting for a change of a state.
pseudo_atomic_interrupted, if inside pseudo-atomic section)
On some occasions, runtime is in very fragile state and can not do anything that safepoint must do (e.g., change thread state, execute GC, execute interruption). For example, running lisp thread synchronization objects. To control this,
*DISABLE-SAFEPOINTS* variable is used.
GC code is run inside a safepoint, and safepoint code itself is not reenterable. Since GC code itself has safepoints (SUB-GC is a normal lisp function, it calls lisp synchronization routines and does several switches to/from foreign code). To prevent rentering of a safepoint code,
*IN-SAFEPOINT* variable is used.