SBCL Windows threads implementation notes, part 2

2010-11-05

Suspension

Thread suspension is implemented via safepoints. Safepoint is implemented with a read of the special memory location (‘GC poll address’ which is located in ‘GC poll page’).

In the first phase, the ‘master’ thread unmaps the GC poll page. After this, other threads will at some time get page faults. Several issues must be dealt with:

The reaction to unmapping is not immediate - the thread must reach the safepoint
Some threads will not reach safepoint soon (if a thread is executing foreign code or a blocking system call)
Even if a thread has reached a safepoint, it does not mean that GC can start. The thread may be inside a WITHOUT-GCING section, for example. In this case, the thread may not be resumed with the GC poll page unmapped.

We can draw some conclusions:

Every thread that can reach safepoint (if it's not in foreign code or in blocking syscall) must reach it before GC can proceed.
1. Every thread that can not reach safepoint must not interfere with GC if it suddenly returns to a lisp code
After all the threads have reached safepoint, we must wait for all threads to be ready for GC.

This implies the two-phase suspension.

Phase 1:
1. GC poll page is remapped as unreadable
2. The master thread checks each thread: if it's running lisp code, wait until it reaches a safepoint. Thread is considered to reach a safepoint when its state is STATE_SUSPENDED_BRIEFLY.
Phase 2:
1. GC poll page is mapped again so that threads can run until they are ready for GC
2. The master thread waits for every thread to be ready for GC. This is achieved by waiting for the state of every thread to become STATE_SUSPENDED (except for threads that are ready for GC)

Thread is ready for GC if:

thread_state(thread) == STATE_SUSPENDED; or
thread_is running foreign code and it is not inside WITHOUT-GCING or WITHOUT-INTERRUPTS, and blockable signals are unblocked

For this, the thread-local variable *GC-SAFE* is introduced - it tracks the current readiness for GC of a thread. It is guaranteed that when *GC-SAFE* changes from NIL to T, the thread checks if GC is in progress and enters the suspended state.

Thread interruption

Thread interruption is similar, but we don't need to wait for all threads to reach a safepoint - it is only necessary for the interrupted thread to reach a safepoint.

Phase 1:
1. The GC poll page is remapped as unreadable
2. The master thread checks interrupted thread: if it's running lisp code, wait until it reaches a safepoint. Thread is considered to reach a safepoint when its state is STATE_SUSPENDED_BRIEFLY.
Phase 2:
1. GC poll page is mapped
2. All threads that have reached a safepoint are released

Safepoint code

The safepoint code is called to check whether the thread has something to do related to SBCL internal working. It is called:

When a thread reaches a safepoint and the GC poll page is unmapped
When leaving and entering a foreign code
On other occasions.

Safepoints have several responsibilities.

If there is a GC or thread interruption in progress, the thread has to notify the master thread that it has reached a safepoint. Safepoint does this by changing the state to STATE_SUSPENDED_BRIEFLY and waiting for the state to be changed by the master thread. When it resumes, the thread checks whether it should suspend or interrupt.
If a thread should suspend, it is checked whether it can be suspended. If thread is suspendable, it changes its state to STATE_SUSPENDED; otherwise, it sets STOP_FOR_GC_PENDING (and sets pseudo_atomic_interrupted)
If a thread should interrupt, it either sets INTERRUPT_PENDING and pseudo_atomic_interrupted or executes interruption.
If GC is pending and the thread can do GC, it runs the GC
If an interrupt is pending and a thread can execute it, the thread executes it.

On some occasions, the runtime is in a very fragile state and can not really do anything that safepoint must do (e.g., change thread state, execute GC, execute interruption). These are e.g. using lisp thread synchronization primitives. To control this, the *DISABLE-SAFEPOINTS* variable is used.

GC code is run inside a safepoint, and safepoint code is not reentrable. GC code itself has safepoints (since SUB-GC is a normal lisp function, it calls lisp synchronization routines and does several switches to/from foreign code). To prevent the reentering of a safepoint code, the *IN-SAFEPOINT* variable is used.