Finding Root Cause of SIGILL Issue in Android’s Qemu ARM Emulator

I’ve encountered some persistent error when trying to use the debugger under my Android emulator. This error is a real show stopping because it can not be used to perform reliable and consistent debug session.

The symptom is that, when I perform breakpoint at some function, after performing 1 or 2 si or finish ARM gdb command it will come to the dreaded SIGILL. One example of countless SIGILL as follow:

Program received signal SIGILL, Illegal instruction.
0xb6ea21de in __set_errno () from c:/android/lib/

It is also a mysterious one, because when I checked the instruction near the address reported as the causes of SIGILL, there’s no indication of instruction that causes SIGILL.

For the above example, there’s nothing wrong with the instruction at address 0xb6ea21de. This address belongs to address space and the instruction for my version of android is:

131de: 6004 str r4, [r0, #0]

But still the debugger report it as SIGILL, which is ridiculous. I believe that this event shouldn’t happening when I use the real device for debugging purpose, but this is another issue.

I haven’t yet perform detailed analysis on debugging on real device, but, I assume, when there’s no problem in the real device, then culprit is in the emulator itself.

So, what’s the cause of the above error ?

In the Android qemu’s ARM emulator, the exception is handled in do_interrupt function (helper.c). There are several type of exception in ARM architecture, and I’m not try to explain at length regarding this basic information, which is hugely available on the net.

The exception of interest inside do_interrupt is EXCP_UDEF. At EXCP_UDEF case (or other exception cases), the emulator will transfer the program counter (pc) to the vector data already set up by Android Kernel OS’s __vectors_start memory location. In the case of undefined instruction, the OS kernel will branch to vector_und (b vector_und + stubs_offset).

In this case, the kernel will have the chance to perform some setup and processing the undefined instruction such as calling the co-processor routine, etc. All of the discussed routine names can be found in Android Kernel OS source entry-armv.s file.

vector_und routine will transfer to __und_usr. If you try to break into this function on kernel address space, there will be many of these calls, because at the undefined instruction state, the instruction is not yet categorically as real undefined instruction, it may just another co-proc or board instruction.

To perform correct breakpoint at how Android OS kernel handles real undefined instruction, I should perform breakpoint checks at__und_usr_unknown. This will in turn, calls do_undefinstr (traps.c) which will inform the user space, and I will get the above SIGILL screen.

But what instruction that causes SIGILL ? This can be done by examining r0 and r5 value at the start of __und_usr_unknown routine. When the ARM processor, in this case the emulator state is ARM state, r0 will contain ARM instruction, and r5 will contain instruction for thumb state.

In my case, the r5 contains 56833 (0xDE01). When this value is read backward, we get 0x01DE. This is interesting because when I compared it will the address location of SIGILL above of 0xb6ea21de, there’s some correlation in it. It ends with same relative address location.

So, the cause, in some unknown translation steps in the emulator, at the debugging phase, somehow it will grab the part of address location as the instruction. And this should be the job of the qemu or android developer to figure out why đŸ™‚


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: