Interrupts & Exceptions

Required reading: xv6 trapasm.S, trap.c, syscall.c, initcode.S, usys.S. Skim vectors.S, lapic.c, ioapic.c, picirq.c.
You will need to consult IA32 System Programming Guide chapter 5 (skip 5.7.1, 5.8.2, 5.12.2).


  last week we transferred from kernel to user
  today: how to get from user to kernel
  three reasons for transitions:
    system calls
    program faults (div by zero, page fault)
    external device interrupts
  why do we need to take special care for user -> kernel?
    only kernel can touch devices, MMU, FS, other process' state, &c
    think of user program as a potential malicious adversary
  what has to happen?
    save user state for future transparent resume
    set up for execution in kernel (stack, segments)
    choose a place to execute in kernel
    get at system call arguments
    do it all securely
  it's neat that interrupts, faults, system call use same mechanism!

Calling a System Call from User Space

Execute the int. Now where are we? How did we get here?

The INT instruction

The x86 CPU supports 256 interrupt vectors. Different hardware conditions produce interrupts through different vectors. The kernel can tell why the interrupt occured by noting the vector. The vector refers to an descriptor in the IDT. Each descriptor contains a segment selector, an offset in that segment, and a DPL.

The INT instruction takes the following steps (these will be similar to all interrupts and faults, though there are slight differences):

  1. decide the vector number, in this case it's the 0x40 in int 0x40.
  2. fetch the interrupt descriptor for vector 0x40 from the IDT. the CPU finds it by taking the 0x40'th 8-byte entry starting at the physical address that the IDTR CPU register points to.
  3. check that CPL <= DPL in the descriptor (but only if INT instruction).
  4. save ESP and SS in a CPU-internal register (but only if target segment selector's PL < CPL).
  5. load SS and ESP from TSS ("")
  6. push user SS ("")
  7. push user ESP ("")
  8. push user EFLAGS
  9. push user CS
  10. push user EIP
  11. clear some EFLAGS bits
  12. set CS and EIP from IDT descriptor's segment selector and offset
INT is a complex instruction. Does it really need to take all those steps? Why not let the kernel interrupt handler do some of them? For example, why does INT need to save the SS and ESP?

xv6 set up the IDT in tvinit(), set the IDTR in idtinit(), and set the SS and ESP in the TSS in usegment(). Check IDT 64 to see how the IDT is set up to handle vector 0x40.

Trap Handling

int 0x40 entered the kernel at vector64, generated by

What is the current CPL? How was it set? Could the user abuse the INT instruction to exercise privilege or break the kernel?

x/16x $esp in order to see what int put on the stack. Compare to Figure 5-4. What stack is being used?

vector64 pushes a few items on the stack and then jumps to alltraps. Why not have vector 64 in the IDT point directly to alltraps?

Single-step until the call to trap. x/18x $esp. Compare with struct trapframe.

At the start of trap(), what is tf->trapno? How was it set?

System call dispatch, arguments and return value

trap() calls syscall(), since trapno in this case is T_SYSCALL (0x40).

syscall() dispatches to a function it finds by indexing into the syscalls array. It uses the eax from the trap frame as the index. What is in that eax? Where was it set?

Now we're in sys_open(). Where are the arguments the user program originally passed to open()? How can the kernel get at them?

sys_open() calls argint() to get its 2nd argument. Argint calculates the value cp->tf_esp + 4 + 4*n. What is this? Why the first 4? Why the 4*n?

fetchint() checks that the address is not beyond the end of user memory. But addr was just calculated by kernel code (in argint()); since the kernel code is trustworthy, is this check really neccessary?

Why do we do seemingly redundant checks for addr and then addr+4? Can't we just check addr+4?

Why does fetchint() add p->mem to addr?

Back to sys_open(). It does its job (which we will talk about later) and finally returns a file descriptor using the ordinary C return statement. syscall() puts that return value in cp->tf->eax. Why?

Trap Return

syscall() returns to trap(), and trap() returns to alltraps. b "trap"+0x195, single-step until alltraps. x/18x $esp to see the trap frame again. What is different and why?

single-step until iret, x/5x $esp, single-step into user space. Print the registers and stack. What will the return value to the original call to open() be?

What would happen if a user program divided by zero? What if kernel code divided by zero?

In Unix, traps often get translated into signals to the process. Some traps, though, are (partially) handled internally by the kernel -- which ones?

Some traps push an extra error code onto the stack (typically containing the segment descriptor that caused a fault). But this error code isn't pushed by the INT instruction. Can the user confuse the kernel by invoking INT 0xc (or any other vector that usually pushes an error code)? Why not?

Device Interrupts

Like system calls, except: devices generate them at any time, there are no arguments in CPU registers, nothing to return to, usually can't ignore them. There is hardware on the motherboard to signal the CPU when a device needs attention (e.g. the user has typed a character on the keyboard). There's usually a separate vector for each device. Let's look at the timer interrupt; the timer hardware generates an interrupt 100 times per second so that the kernel can track the passage of time and so the kernel can time-slice among multiple running processes. The timer interrupts through vector 32.

p idt[32], then set a breakpoint at vector32

x/20x $esp. What was the CPU doing at the time of the interrupt? What stack is being used?

The interrupt will have pushed different numbers of words on the stack depending on whether the CPU was in user-space or the kernel; how does iret know how many words to pop?

What prevents lots of interrupts from coming in all at once and overflowing the kernel stack? Print the registers; IF=0x200. p idt[32], p idt[64].

trap(), when it's called for a time interrupt, does just two things: increment the ticks variable, and call wakeup. At the end of trap, xv6 calls yield. as we will see, may cause the interrupt to return in a different process!

XXX Turns out our kernel had a subtle security bug in the way it handled traps... vb 0x1b:0x11, run movdsgs, step over breakpoints that aren't mov ax, ds, dump_cpu and single-step. dump_cpu after mov gs, then vb 0x1b:0x21 to break after sbrk returns, dump_cpu again.


JOS has a rather different structure from xv6.

Since JOS does not use segmentation, where do traps vector in JOS?

JOS also has a very different kernel architecture: only one kernel stack, as opposed to one per process in xv6. The kernel is not re-entrant (cannot be interrupted), so all IDT entries are interrupt gates in JOS.