21:04 Sun, 03 Feb 2008 PST -0800

Java Signal Handling: Turning SIGFPE into java.lang.ArithmeticException

Introduction

When implementing a virtual machine such as Java's, it's necessary (and sometimes beneficial) to handle some unexpected conditions by allowing the errors to occur, and then catching the resultant signals delivered by the operating system. Take, for example, divide by zero:

int i = 5 / 0;

Hotspot could generate code to check divisor == 0 before every division operation:

cmpl    $0, %ecx  // Is the divisor 0
je      L2        // Jump to div-by-zero handler
 
movl    %edx, %eax // store in divisor eax
sarl    $31, %edx // clear edx, leaving the sign bit
idivl   %ecx // divide edx:eax / ecx

But instead, Hotspot takes a leap of faith -- since programs should rarely divide by zero, Java emits the division instruction, and if the divisor is 0, relies on its signal handler to interpret the resultant SIGFPE:

if (sig == SIGFPE  && (info->si_code == FPE_INTDIV || info->si_code == FPE_FLTDIV)) {
    stub = SharedRuntime::continuation_for_implicit_exception(thread, pc, SharedRuntime::IMPLICIT_DIVIDE_BY_ZERO);

On Friday, I received a bug report for the x86_64 version of SoyLatte from Jibril Gueye. As it turns out, divide by zero errors were not being handled in the 64-bit VM, and instead of throwing an ArithmeticException, Java was unceremoniously crashing:

landonf@max> /usr/local/soylatte16-amd64-1.0.1/bin/java Test
#
# An unexpected error has been detected by Java Runtime Environment:
#
#  SIGFPE (0x8) at pc=0x0000000101886ba8, pid=35000, tid=0x301000

After fixing the issue, I thought it would be interesting to discuss how Java handles signals, and why the SIGFPE handler didn't work:

Signal Registration and Delivery

After the JVM has parsed its command line arguments, the os::init_2() operating-specific method is called. This method is responsible for performing any remaining OS-specific initialization tasks, such as the registration of signal handlers. The BSD implementation can be found in hotspot/src/os/bsd/vm/os_bsd.cpp.

At this time, an architecture-specific JVM_handle_bsd_signal() function is registered as a handler for SIGSEGV, SIGPIPE, SIGBUS, SIGILL, and SIGFPE. (See signal.h for descriptions.) When a divide by zero error occurs, SIGFPE is delivered to the process, and the JVM's JVM_handle_bsd_signal() is called.

The signal handler is registered using sigaction, with the SA_SIGINFO flag set. According to the Single Unix Specification, "If SA_SIGINFO is set and the signal is caught, the signal-catching function will be entered as:"

void func(int signo, siginfo_t *info, void *context);

Upon a divide by zero, the provided siginfo structure contains a 'si_code' member set to FPE_INTDIV:

typedef struct __siginfo {
    int     si_signo;               /* signal number */
    int     si_errno;               /* errno association */
    int     si_code;                /* signal code */
    ...
} siginfo_t;

With this information, our Java_handle_bsd_signal() implementation can check the signal number and code, and throw an ArithmeticException:

if (sig == SIGFPE  &&
     (info->si_code == FPE_INTDIV || info->si_code == FPE_FLTDIV)) {
    stub = SharedRuntime::continuation_for_implicit_exception(thread,
      pc, SharedRuntime:: IMPLICIT_DIVIDE_BY_ZERO);

SharedRuntime::continuation_for_implicit_exception() returns the entry point to Hotspot-generated code that sets up Java exception dispatching in the current frame. When the signal handler is finished, it saves the program counter and jumps to this stub, which handles setting up the frame and throwing the ArithmeticException.

Mac OS X and FPE_INTDIV

After receiving the bug report, I decided to take a look at Mac OS X's kernel signal handling code. On Darwin, the sendsig function handles creation and dispatch of UNIX signals to user processes. Looking at sendsig, we see that Mac OS X doesn't set si_code to FPE_INTDIV, and as such, JVM_handle_bsd_signal() can't decipher the signal:

    case SIGFPE:
#define FP_IE 0 /* Invalid operation */
#define FP_DE 1 /* Denormalized operand */
#define FP_ZE 2 /* Zero divide */
#define FP_OE 3 /* overflow */
#define FP_UE 4 /* underflow */
#define FP_PE 5 /* precision */
    if (ut->uu_subcode & (1 << FP_ZE)) {
        sinfo64.si_code = FPE_FLTDIV;
    } else if (ut->uu_subcode & (1 << FP_OE)) {
        sinfo64.si_code = FPE_FLTOVF;
    } else if (ut->uu_subcode & (1 << FP_UE)) {
        sinfo64.si_code = FPE_FLTUND;
    } else if (ut->uu_subcode & (1 << FP_PE)) {
        sinfo64.si_code = FPE_FLTRES;
    } else if (ut->uu_subcode & (1 << FP_IE)) {
        sinfo64.si_code = FPE_FLTINV;
    } else {
        printf("unknown SIGFPE code %ld, subcode %lx\n",
              (long) ut->uu_code, (long) ut->uu_subcode);
        sinfo64.si_code = FPE_NOOP;
    }
    break;

As you can see, there's no code to handle FPE_INTDIV, si_code is set to FPE_NOOP, and an error message is printed to the console. A quick check of dmesg shows that our kernel is indeed printing "unknown SIGFPE" when Java attempts a divide by zero:

sudo dmesg | grep SIGFPE
unknown SIGFPE code 1, subcode 0

This is suboptimal behavior, so I've filed a bug (5708523 - xnu sendsig() does not set siginfo->si_code = FPE_INTDIV for SIGFPE). In the meantime, a fix is necessary.

You may recall the 'void *context' argument passed to the signal handler. On Mac OS X, this is actually a pointer to ucontext structure. The ucontext contains the full context of the thread's state, at the time of the exception. This includes the program counter -- a register containing the address of the instruction that caused the exception.

Since we have the address of the instruction, we can determine what the instruction is. Once we know what the instruction is, we determine if it could have caused an integer divide by zero exception. This fix was used previously in Java to support Linux/x86 1.x kernels, which also did not set si_code.

To determine what instruction(s) could cause a FPE_INTDIV on 64-bit x86 machines, I consulted the Intel 64 and IA-32 Architectures Software Developer's Manuals -- the answer is idiv and idivl. Also, on amd64 machines, most operations remain 32-bit, and 64-bit operations require the a REX prefix. We'll need to skip the prefix if it exists.

Now we can add code to examine the program counter in JVM_handle_bsd_signal():

// HACK: si_code == FPE_INTDIV is not supported on Mac OS X (si_code is set to FPE_FPE_NOOP).
// See also xnu-1228 bsd/dev/i386/unix_signal.c, line 365
// Filed as rdar://5708523 - xnu sendsig() does not set siginfo->si_code = FPE_INTDIV for SIGFPE
} else if (sig == SIGFPE && info->si_code == FPE_NOOP) {
    int op = pc[0];
 
    // Skip REX
    if ((pc[0] & 0xf0) == 0x40) {
        op = pc[1];
    } else {
        op = pc[0];
    }
 
    // Check for IDIV
    if (op == 0xF7) {
        stub = SharedRuntime::continuation_for_implicit_exception(thread, pc, SharedRuntime:: IMPLICIT_DIVIDE_BY_ZERO);
    } else {
        // TODO: handle more cases if we are using other x86 instructions
        //   that can generate SIGFPE signal.
        tty->print_cr("unknown opcode 0x%X with SIGFPE.", op);
        fatal("please update this code.");
    }

With the fix in place, Java throws the expected ArithmeticException:

landonf@max:~> java Test
Exception in thread "main" java.lang.ArithmeticException: / by zero
    at Test.main(Test.java:3)

[/code/java] permanent link