Signaling in Quark

As some of you may know, I’ve been working on a kernel called Quark for quite a bit of time. Quark gained signal support awhile back, but I finally got around to doing a writeup.

Note that this implementation is a first pass, and is definitely NOT complete. In fact, I already caught some bugs after completing this writeup, but I still think this is still valid.

The relevant merge request for this page is #15. Feel free to look at that for reference.

What are signals?

Copied from ‘Introduction To Unix Signals Programming’:

Signals, to be short, are various notifications sent to a process in order to notify it of various "important" events. By their nature, they interrupt whatever the process is doing at this minute, and force it to handle them immediately. Each signal has an integer number that represents it (1, 2 and so on), as well as a symbolic name that is usually defined in the file /usr/include/signal.h or one of the files included by it directly or indirectly (HUP, INT and so on. Use the command 'kill -l' to see a list of signals supported by your system).

Each signal may have a signal handler, which is a function that gets called when the process receives that signal. The function is called in "asynchronous mode", meaning that no where in your program you have code that calls this function directly. Instead, when the signal is sent to the process, the operating system stops the execution of the process, and "forces" it to call the signal handler function. When that signal handler function returns, the process continues execution from wherever it happened to be before the signal was received, as if this interruption never occurred.

Signals are a little more complicated than that, as you’ll soon see, but that basically sums it up. In essence, we’re interrupting the normal flow of execution to notify the program has something to do/handle.

Implementing it in Quark

Basic type definitions

So how exactly does Quark implement signals? First, we need a couple definitions and utility functions. The POSIX page for signal.h is really useful since it’s aimed at operating system developers and C library writers, and following the guidelines ensures that our interface will be compatible with existing software. First, we define all the signal numbers:

#define SIGHUP 1
#define SIGINT 2
#define SIGQUIT 3
#define SIGILL 4
#define SIGTRAP 5
#define SIGABRT 6
#define SIGIOT 6
#define SIGBUS 7
#define SIGFPE 8
...

For compatibility and ease of development, these numbers are identical to Linux. In fact, I obtained them from the Linux kernel’s copy of signal numbers. In total, I defined 32 numbers, although there are a couple more signals that are actually aliases of the first 32.

POSIX also defines a set of symbolic constants that expand to (invalid) function pointers.

#define SIG_ERR ((void (*)(int)) - 1)
#define SIG_DFL ((void (*)(int))0)
#define SIG_IGN ((void (*)(int))1)

When we store signal handler pointers, we can use those constants to indicate certain actions to be taken instead of calling the signal handler. For example, if the signal handler for SIGSEGV is set to the constant SIG_DFL, then we take the default action for SIGSEGV (which is to terminate) instead of calling the signal handler.

Besides defining the handlers and signal numbers, we also need to define how we store signal states such as the signal mask and pending signals. POSIX defines something called sigset_t but leaves the implementation to the operating system.

For Sortix, an operating system that you’ll notice my signal implementation somewhat resembles and is in fact based off of, it represents sigset_t as a struct containing a bit array.

However, since we define 32 signals, it can fit in a uint32_t. We can then treat it as a raw bit array. Therefore, for sigset_t, we define it as:

typedef uint32_t sigset_t;

To store signal handlers, there is a structure called sigaction. Basically, the structure stores the signal handler itself, the flags for the handler, and the signal mask during execution of the signal handler. Quark defines it as:

struct sigaction {
    union {
        void (*sa_handler)(int);
        void (*sa_sigaction)(int, siginfo_t*, void*);
    };
    sigset_t sa_mask;
    int sa_flags;
};

The corresponding flags for sa_flags that specify the behavior of the signal handler are:

#define SA_NOCLDSTOP (1 << 0)
#define SA_NOCLDWAIT (1 << 1)
#define SA_NODEFER (1 << 2)
#define SA_ONSTACK (1 << 3)
#define SA_RESETHAND (1 << 4)
#define SA_RESTART (1 << 5)
#define SA_RESTORER (1 << 6)
#define SA_SIGINFO (1 << 7)

You might notice that the handler functions are in a union. This is because it only makes sense to have one handler per signal. Which handler is called (sa_handler or sa_sigaction) depends on whether SA_SIGINFO is set in sa_flags. When SA_SIGINFO is set, sa_sigaction is called, otherwise sa_handler is.

POSIX also allows setting an alternate stack for signal handlers. This information is encoded in an opaque type called stack_t. Quark defines it as:

typedef struct {
    void* ss_sp;    /* Base address of stack */
    int ss_flags;   /* Flags */
    size_t ss_size; /* Number of bytes in stack */
} stack_t;

There are a couple other structures, but ‘ll go over them once we get to them.

To manipulate sigset_t, POSIX specifies a family of functions that start with sigset*. While this is primarily aimed for userspace, it’s also useful to define in the kernel.

int sigemptyset(sigset_t* set);
int sigfillset(sigset_t* set);
int sigaddset(sigset_t* set, int signum);
int sigdelset(sigset_t* set, int signum);
int sigismember(const sigset_t* set, int signum);

Descriptions of those functions can be found in their respective man pages. To make life easier, I also added several other functions:

// Is the set empty?
bool sigisemptyset(sigset_t* set);
// These functions follow x86 assembly semantics where dest is modified
// AND the sets together
void sigandset(sigset_t* dest, const sigset_t* source);
// OR the sets together
void sigorset(sigset_t* dest, const sigset_t* source);
// Invert the set
void signotset(sigset_t* dest, const sigset_t* source);

These will become immensely useful once we start figuring out signal masks. Because sigset_t in reality is just a uint32_t, these operations are trivial to do. For example, sigisemptyset() can be easily implemented as:

bool sigisemptyset(sigset_t* set)
{
    return (*set) ? false : true;
}

Organization

Now that we have those basic types, how do we keep track of signals? The first logical answer would be to keep everything in the thread structure, but is that correct?

Unfortunately, signals and multithreading have always behaved a little strangely. In Quark, threads are the actual execution units that are scheduled while process is a logical structure that stores things such as open files and the memory address space. For signals:

Delivering a signal is somewhat confusing. Quark allows you to deliver a signal to either a process (via kill()) or to a thread (via something like pthread_kill() once we support pthreads).

So, this is how Quark ends up storing signal stuff. Threads will store:

In fact, here is the relevant code:

size_t signal_count;
bool signal_required;
sigset_t signal_mask;
sigset_t signal_pending;
stack_t signal_stack;

Processes will only store the signal handlers:

struct sigaction signal_actions[NSIGS];

Signal Delivery

Now that we have the definitions out of the way, let’s get into the actual process. To make this easier, I’ll just walk through a sample flow.

First, a program needs to deliver a signal. This can be itself (using raise()) or another thread calling kill(). In reality, these largely share the same codepaths: raise() is defined as kill(getpid(), ...);

kill() itself is just a wrapper around the kill system call:

int kill(pid_t pid, int sig)
{
    return syscall(SYS_kill, pid, sig, 0, 0, 0);
}

Once a program calls kill(), control is transferred to the kernel. Here is the handler:

static long sys_kill(pid_t pid, int signum)
{
    Log::printk(Log::DEBUG, "[sys_kill] %u %d\n", pid, signum);
    Process* process = Scheduler::find_process(pid);
    if (!process) {
        return -ESRCH;
    }
    Log::printk(Log::DEBUG, "Found process at %p\n", process);
    process->send_signal(signum);
    return 0;
}

First, the handler gets the process specified by the PID. If it doesn’t exist, it returns ESRCH, as specified by POSIX. Then, it calls process->send_signal().

process::send_signal() selects a thread to actually handle the signal. As stated before, POSIX does not specify which thread handles it, so Quark simply chooses the first thread. In the future, it should be changed to select a thread that is sleeping or has less load.

The thread that gets selected has it’s member function send_signal() called, which is also relatively simple:

bool Thread::send_signal(int signum)
{
    if (signum == 0) {
        return false;
    }

    if (signum < 0 || signum >= NSIGS) {
        return false;
    }
    // TODO: Check if signal is pending already and return ESIGPENDING
    Signal::sigaddset(&this->signal_pending, signum);
    this->refresh_signal();
    return true;
}

First, it does a check for out of range signals. Then, it adds the signal to the pending list. You’ll notice that the comment states that we need to check for already pending signals, which is correct. However, that is a relatively minor issue and is low on the list of things to fix.

Then, send_signal() calls refresh_signal() which simply determines whether a signal needs to be handled and sets a flag appropriately. This is important because it’s not worth spending time handling a signal that ultimately is blocked anyways, so refresh_signal() performs that check, among others.

Signal Handling

You may notice that at no point do we actually directly handle the signal during the delivery of the signal. The code that triggers the signal handling runs once the thread that has signals pending is scheduled.

Signals are checked for in two places. First, signals are checked at the exit of a system call:

extern "C" void syscall_trampoline(struct InterruptContext* ctx)
{
    encode_tcontext(ctx, &Scheduler::get_current_thread()->tcontext);
    ...
    if (Scheduler::get_current_thread()->signal_required) {
        Scheduler::get_current_thread()->handle_signal(ctx);
    }
}

This means that technically, signals can be handled immediately after a signal is delivered IF the signal was delivered to the thread that sent the signal.

The second place is at the exit of a interrupt. This is important because if we only handled signals at a system call exit, tasks that are frozen or in a CPU-intensive task with minimal system calls will never respond to a signal (such as SIGINT).

Thus, a interrupt (usually forced by the system timer) will also trigger a signal handle.

Remember, the signal_required flag was set earlier by refresh_signal(). However, if a signal was actually set, handle_signal() will run.

handle_signal() is the workhorse of the signal processing code.

First, it needs to figure out what signals will actually be deliverable:

sigset_t unblocked_signals;
Signal::signotset(&unblocked_signals, &this->signal_mask);
// Ensure that SIGKILL and SIGSTOP are always unblocked
Signal::sigorset(&unblocked_signals, &unblockable_signals);

sigset_t deliverable_signals;
Signal::sigemptyset(&deliverable_signals);
Signal::sigorset(&deliverable_signals, &unblocked_signals);
Signal::sigandset(&deliverable_signals, &this->signal_pending);

It takes the signal mask, inverts it (to get the deliverable signals), always makes SIGKILL and SIGSTOP deliverable (POSIX states they are unblockable), and then combines the unblocked signals set with the pending signals set.

Then, handle_signal() actually selects a signal to deliver:

int signum = Signal::select_signal(&deliverable_signals);
if (!signum) {
    return;
}
Log::printk(Log::DEBUG, "[signal]: Selecting signal %d\n", signum);

select_signal() is pretty trivial. First, it checks if SIGKILL is pending. If it is, it immediately returns that. Then, SIGSTOP is checked for. If it is pending, it is delivered. Finally, it iterates through the set of pending signals and returns the first pending signal.

handle_signal() then marks that signal as handled:

// The signal is handled
Signal::sigdelset(&this->signal_pending, signum);
this->refresh_signal();

If the signal selected is SIGKILL, handle_signal() immediately kills the current thread regardless of what action is specified.

We then get the signal action struct:

struct sigaction* action = &this->parent->signal_actions[signum];

If the action is to ignore (SIG_IGN), we return because there is nothing to do. Otherwise, we check for SIG_DFL and handle them based on the default action specified by POSIX.

Next, we check if an alternate stack is requested and usable:

bool requested_altstack = action->sa_flags & SA_ONSTACK;
bool usable_altstack =
    !(this->signal_stack.ss_flags & (SS_ONSTACK | SS_DISABLE));
bool use_altstack = requested_altstack && usable_altstack;

if (use_altstack) {
    Log::printk(Log::DEBUG, "[signal] Signal handler requests and has a "
                            "valid alternate stack, using...\n");
    this->signal_stack.ss_flags |= SS_ONSTACK;
}

Then, we prepare the mcontext and ucontext structs. mcontext stores the actual machine context (similar to an interrupt context), while ucontext holds other various metadata. These are defined by POSIX and are available to signal handlers.

mcontext is especially important because sigreturn will later restore the original thread state using the mcontext struct passed here. mcontext is encoded using the encode_mcontext() function, which basically encode_tcontext() but for mcontext.

struct ThreadContext new_state, original_state;
encode_tcontext(ctx, &original_state);

siginfo_t siginfo;
siginfo.si_signo = signum;

ucontext_t ucontext = {
    .uc_link = nullptr,
    .uc_sigmask = this->signal_mask,
};

String::memcpy(&ucontext.uc_stack, &this->signal_stack,
                sizeof(this->signal_stack));
Signal::encode_mcontext(&ucontext.uc_mcontext, &original_state);

The current signal is also masked out at this time to prevent an identical signal from being delivered:

// Mask out current signal if SA_NODEFER is not passed in
if (!(action->sa_flags & SA_NODEFER)) {
    Signal::sigaddset(&this->signal_mask, signum);
}

Finally, we prepare a ksignal structure. ksignal is simply a convenience structure to make it easier to pass signal handling data to various kernel functions. We pass it to the architecture-dependent code to setup the registers, and return:

struct ksignal ksig = {
    .signum = signum,
    .use_altstack = use_altstack,
    .sa = action,
    .siginfo = &siginfo,
    .ucontext = &ucontext,
};

this->setup_signal(&ksig, &original_state, &new_state);

decode_tcontext(ctx, &new_state);

What exactly does setup_signal() do? Here is the x86_64 implementation:

String::memcpy(new_state, original_state, sizeof(*new_state));
addr_t stack_location;
if (ksig->use_altstack) {
    stack_location = reinterpret_cast<addr_t>(this->signal_stack.ss_sp) +
                        this->signal_stack.ss_size;
} else {
    stack_location = original_state->rsp;
}
// Construct a stack return frame
new_state->rsp = stack_location;
new_state->rsp -= 128;
new_state->rsp -= sizeof(struct stack_frame);
new_state->rsp &= ~(16UL - 1UL);
struct stack_frame* frame = (struct stack_frame*)new_state->rsp;

String::memcpy(&frame->siginfo, ksig->siginfo, sizeof(frame->siginfo));
String::memcpy(&frame->ucontext, ksig->ucontext, sizeof(frame->ucontext));
frame->ret_location = this->parent->sigreturn;

Log::printk(Log::DEBUG, "Going to return to gadget at %p\n",
            frame->ret_location);
Log::printk(Log::DEBUG, "Frame at %p\n", frame);

new_state->rip = (uint64_t)ksig->sa->sa_handler;
new_state->rdi = ksig->signum;

/*
    * Technically SA_SIGINFO specifies this, but programmers make mistakes. To
    * prevent them from getting segmentation faults we still pass these in
    * as arguments regardless of whether SA_SIGINFO is used. Not POSIX
    * compliant, but meh
    */
new_state->rsi = (uint64_t)&frame->siginfo;
new_state->rdx = (uint64_t)&frame->ucontext;
new_state->rsp = (uint64_t)frame;

Basically, it selects the stack and aligns it, manipulates the stack so that it will return to our code to clean up after a signal, and sets up the registers so that the signal handler will have the correct arguments passed in.

I will describe the signal return mechanism below, so don’t worry if the stack manipulation stuff seems vague.

Entering userspace

Now, we are back in userspace, running in the signal handler! The rules for the signal handler is the same as with other operating systems, meaning that nonreentrant functions are largely unsafe. The page here is very useful.

Note that while SA_SIGINFO is technically required in order to get the siginfo and ucontext structs as parameters, Quark unconditionally provides those arguments, as you’ll notice from the setup_signal() code above.

Returning to the kernel

Once the userspace signal handler returns, control goes back to the kernel code thanks to the stack trickery we did in setup_signal(). Note that this portion is heavily architecture-dependent, and may vary on non x86 platforms.

Here is the stack trick. On x86, the stack frame layout is as follows during a function call:

    |---------------------|
    |    Return address   |
RBP |---------------------|
    | Arguments (for x86) |
    |---------------------|
    | Callee stack stuff  |
RSP |---------------------|

Once the function is done and preparing to return, it looks like this:

    |---------------------|
    |    Return address   |
RSP |---------------------|

The ret instruction pops off the return address into the instruction pointer registers.

Thus, by setting up the stack so that RSP points to AFTER the return address we want to return to, once the function returns, it will jump to the return address that we just set.

For some pseudocode:

uint64_t stack[1000];
uint64_t* rsp = stack[998];
stack[999] = <return address>;

The actual implementation is a little different. Remember setup_signal()? Here is the relevant portion:

// Construct a stack return frame
new_state->rsp = stack_location;
new_state->rsp -= 128;
new_state->rsp -= sizeof(struct stack_frame);
new_state->rsp &= ~(16UL - 1UL);
struct stack_frame* frame = (struct stack_frame*)new_state->rsp;

String::memcpy(&frame->siginfo, ksig->siginfo, sizeof(frame->siginfo));
String::memcpy(&frame->ucontext, ksig->ucontext, sizeof(frame->ucontext));
frame->ret_location = this->parent->sigreturn;

What is struct stack_frame? It’s simply an easier way to represent the stack frame instead of working with raw arrays.

struct stack_frame {
    uint64_t ret_location;
    ucontext_t ucontext;
    siginfo_t siginfo;
};

The code sets stack_frame to point to RSP, which has allocated enough space for the red zone + the size of the struct stack_frame (remember, the stack grows down on x86), which means that the stack now looks like this:

    |---------------------|
    |        siginfo      |
    |---------------------|
    |       ucontext      |
    |---------------------|
    |     ret_location    |
RSP |---------------------|

Once the function returns, it will pop off ret_location and jump to whatever we want. Magic!

So, where exactly does ret_location point to? In the code snippet above, we set it to this->parent->sigreturn, but what is this->parent->sigreturn?

In fact, we set it earlier when we loaded the ELF binary into the memory. In Thread::load(), we map a page for the sigreturn code:

if (parent->sections->locate_range(sigreturn_zone, USER_START, 0x1000)) {
    Log::printk(Log::DEBUG, "Sigreturn page located at %p\n",
                sigreturn_zone);
    parent->sections->add_section(sigreturn_zone, 0x1000);
} else {
    Log::printk(Log::ERROR, "Failed to locate sigreturn page\n");
    return false;
}
...
Memory::Virtual::map_range(sigreturn_zone, 0x1000,
                               PAGE_USER | PAGE_WRITABLE);

parent->sigreturn = sigreturn_zone;

// Copy in sigreturn trampoline code
String::memcpy((void*)sigreturn_zone, signal_return_location, 0x1000);

// Make it unwritable
Memory::Virtual::protect(sigreturn_zone, PAGE_USER);

sigreturn_zone is a pointer to a function called signal_return:

extern "C" void signal_return();
void* signal_return_location = (void*)&signal_return;

signal_return is defined as follows (for x86_64):

global signal_return
signal_return:
    mov rdi, rsp

    ; Technically not syscall compliant, but we don't care about anything
    mov rax, qword 15
    syscall
    ; No return

Basically, it just calls the sys_sigreturn system call. This code is copied into every address space in userspace so that it is callable.

However, there is one small nuance. Remember the stack frame layout? Now RSP points to struct ucontext. By moving it into rdi, we effectively just passed the pointer to struct ucontext as the first argument to sys_sigreturn.

To recap:

The system call handler itself is pretty trivial too:

static void sys_sigreturn(ucontext_t* uctx)
{
    Log::printk(Log::DEBUG, "[sys_return] %p\n", uctx);
    ThreadContext tctx;
    /*
     * The syscall handler saves the userspace context, so we copy it into
     * tctx to get certain registers (DS, ES, SS) preloaded for us. The
     * rest of the state will get overriden by the stored mcontext
     */
    String::memcpy(&tctx, &Scheduler::get_current_thread()->tcontext,
                   sizeof(tctx));
    // Unset on_stack
    if (Scheduler::get_current_thread()->signal_stack.ss_flags & SS_ONSTACK) {
        Scheduler::get_current_thread()->signal_stack.ss_flags &= ~SS_ONSTACK;
    }
    // Restore signal mask
    Scheduler::get_current_thread()->signal_mask = uctx->uc_sigmask;
    Signal::decode_mcontext(&uctx->uc_mcontext, &tctx);
    load_registers(tctx);
}

sigreturn restores the context using the registers stored earlier in mcontext, resets the stack state if necessary, and restores the original signal masks. It then returns execution to the original state.

Miscellaneous

There are a couple extra changes that were needed to support signals in Quark.

x86_64 system call handler

Quark uses the fast SYSCALL/SYSRET instructions for system calls. This means that we are responsible for saving the system state during system calls. Since most system calls don’t need them (with the exception of fork()), we never bothered to.

For fork(), there was a special case where we manually loaded in the system state into the current thread structure. While this approach sufficed for one system call, it would quickly prove unwieldy as we add more system calls. This approach also meant that we couldn’t change the SYSRET state, which is necessary for signal handling.

Instead, I changed the SYSCALL handler to instead build a interrupt frame that is identical to the one that the processor generates for us during an interrupt.

From the kernel’s perspective, the handlers for interrupt and system calls can be shared now. This is useful because the signal handling code can be commonized between interrupt and system call entry points, and we can also now modify where system calls return to.

This work can be examined here.

Conclusion

So, that right there is how signals work in Quark. In it’s current state, it can support most common signal operations, although I’m sure that corner-cases can cause some weird behavior.

Nonetheless, for a first pass, I think we’re in a good shape for a first pass. If you notice any typos (or bugs in the code!), feel free to shoot me an email! Thanks for reading!