Solving SIGILL in Android Emulator Debugging

Now I have required components that would hopefully provide some insights into causes of SIGILL in Android emulator debugging. So, this bug still exist, even with the latest version (v7.7):

sv01

Instruction code at address 0x000082FC should be 0xe08f3003, but gets written to 0xEF9F0001. So, let’s perform debugging of the gdb client for the origin of this value. Just before executing si command, perform breakpoint at putpkt_binary at client program:

sv02

After performing several continue instruction in the debugger, we have:

sv03

Verify the buf value:

sv04

Indeed the invalid instruction, but where the buf value came from ? Let’s view the partial callstack that leads to this condition:

sv05

Examining the memory at 0x681e35, we have:

sv06

Prototype function target_write_raw_memory as follow:

sv07

It is called by default_memory_insert_breakpoint function using parameters as follow:

sv08

So, the 0xEF9F0001 is actually bp variable passed to the target_write_raw_memory function, whereas bp variable itself is retrieved from gdbarch_breakpoint_from_pc function as follow:

sv09

Inside this function, it is only a stub, with the instruction to return the value of:

sv10

In arm architecture, this is actually a call to arm_breakpoint_from_pc function:

sv11

After more thorough examination in this style, finally I arrived at the conclusion that the value is retrieved from arm_breakpoint field of tdep structure in arm architecture:

sv12

tdep is given by gdbarch_tdep structure as follow:

sv13

arm_breakpoint is initialized at arm_linux_init_abi function with arm_linux_arm_le_breakpoint. But why the ordinary breakpoint seems working without causing SIGILL ?

sv14

0xE7F001F0 is actually an eabi_linux_arm_le_breakpoint (for little endian style). On my emulator, this instruction is passed as an illegal operation. This is verified by the procedure described below:

sv15

In my compiled version of Android Emulator, I’ve created the dummy instruction inside disas_arm_insn function (pictured above) and perform filling out the inv_insn value with the 0xE7F001F0 = 3891266032 and perform break:

sv16

Stepping through this function has confirmed that it is indeed passed as an illegal instruction.

So, does this instruction is get trapped by the Android kernel ? To prove this, I replace the instruction in hello main routine with 0xE7F001F0 (techniques I explained in previous article), I receive:

sv17

Signal 5 is actually a SIGTRAP (signal.h)

But with 0xEF9F0001, I receive:

sv18

This means that for my version of Android kernel, it recognizes only 0xE7F001F0 as a legal breakpoint instruction that can be inserted into the codes.

To prove this conjecture, I’ve perform several testing session, and each SIGILL event is always preceeded by presence of writing 0xEF9F0001 instruction to the address space of emulator.

But the server also detected presence of 0xE7F001F0 instruction writing from the client. So, let’s examine the callstack again at this event:

sv19

By comparing the callstack sequence, the difference lay between the call to function deprecated_insert_raw_breakpoint and bkpt_insert_location. The previous one causes SIGILL, but later one do not.

Let’s perform more close examination of arm_breakpoint value at the bkpt_insert_location function call:

sv20

This is the partial callstack below bkpt_insert_location function, for the purpose of tracing the origin of bl structure initialization:

sv21

By analyzing functions in this callstack chain, it is revealed that the breakpoint mechanism relies on global static structure **bp_location at breakpoint.c.

To the origin of arm_breakpoint value, it is necessary to find when bp_location structure is initialized. Since my current gdb for windows fails to perform a decent memory write breakpoint, I have to resort to WinDBG by first obtaining the correct address from gdb symbol file:

sv22

Then in my WinDBG session:

sv23

There is an allocation call at the 0x0048289F location just before bkpt_insert_location call, and the memory that’s allocated is identical with the initialized bp_location value at the time of bkpt_insert_location which gives the working arm breakpoint value. Let’s perform close examination at this state.

Just after the call to update_global_location_list before bkpt_insert_location, here is the state of bp_location and its associated gdbarch and tdep structure:

sv24

This value is identical and hence is being used at the time of bkpt_insert_location call:

sv25

By examining the source code at bp_location initialization:

sv26

The bp_location is actually assigned from breakpoint_chain->loc value:

sv27

In order to find when and how the loc value is assigned, the above memory write breakpoint for bp_location is repeated for breakpoint_chain, we have:

sv28

Location at 0x00486a17 is inside install_breakpoint which calls to add_to_breakpoint_chain function and is assigned from the function parameter b of type breakpoint structure:

sv29

The b breakpoint structure is initialized inside create_breakpoint_sal, and by examining the routine steps, the breakpoint structure and it’s associated loc variable is actually assigned at init_breakpoint_sal call. To find the initialization event of loc structure, it is necessary to view the low level structure of b to be used for memory write breakpoint at WinDBG:

sv30

And also code location just before init_breakpoint_sal:

sv31

And precise address of b:

sv32

Now activates WinDBG, points to the above specified address and perform memory write breakpoint at the appointed location, we have:

sv33

The edi register value contains the initialized loc variable and the above address location is inside add_location_to_breakpoint function. Value of interest inside this function is loc_gdbarch, and it can be seen below, is retrieved from get_sal_arch. It is interesting that it is not used the original gdbarch value provided by b structure which contains arm breakpoint value that causes SIGILL. It uses original gdbarch from b structure only when the function returns null value:

sv34

Let’s verify this value to confirm that the function is a success:

sv35

The function is a success because it uses a different value for gdbarch structure provided by the one inside b structure. The get_sal_arch function depends on get_objfile_arch (objfiles.c) to determine the architecture. This function in turn gets the archtecture info from provided obj file. This explains why sometimes the breakpoint works and sometimes causes SIGILL.

Now, let’s perform breakpoint without the aid of symbol file:

sv36

Instead of stopping properly at requested address, the debugger receives a SIGILL signal.

Let’s perform breakpoint again at install_breakpoint for the above condition and examine its arm_breakpoint value:

sv37

Further checks for arm_breakpoint value is not necessary because it is clearly that gdbarch value is given by default value, which is 0x2d00280. This value is passed as parameter in the above callstack. We have already verified that for default value, arm_breakpoint will contain 0xEF9F0001 that is not recognized as valid breakpoint value at my Android Emulator and causes SIGILL signal.

Using this fact, if I somehow managed to perform revision on the gdb client program so that the default gdbarch will deliver the recognized breakpoint, then the SIGILL signal can be avoided. But first, it is necessary to find out how this value is initialized.

Earliest usage of gdbarch in the callstack chain is found at create_breakpoint (breakpoint.c) as the part of its parameter. The parameter itself is originated from the call to get_current_arch function. At the end of frame_unwind_arch function, the gdbarch is initialized with next_frame->prev_arch.arch value:

sv38

The next_frame parameter is obtained from selected_frame->next static structure. Let’s create the access paths from this static structure to the above gdbarch value:

sv39

For preparation of tracking of the origin of gdbarch, this is the low level view of selected_frame to locate the ->next variable:

sv40

And the low level view of selected_frame->next for tracking the origin of prev_arch value to obtain the arch value of interest:

sv41

Don’t forget to provide address location of selected_frame structure:

sv42

Now we are ready to perform the tracking process using WinDBG:

sv43

From address info given by WinDBG we have:

sv44

The selected_frame value is initialized using call stack sequence below:

sv45

Close examination of this location revealed that the frame is returned from get_current_frame function and the value is assigned from current_frame variable:

sv46

The above method for tracking memory write is repeated and we have:

sv47

The current_frame structure is actually retrieved from eax value returned from function 0x0059d2dc which is actually:

sv48

By examining the call stack leading to this call:

sv49

The origin of current_frame structure is actually a function argument with 0x2ccdfc0 as its address and close examination of this structure revealed that gdbarch is still not initialized and its next variable is still points to itself:

sv50

To find when the arch (gdbarch) is initialized, it is necessary to locate low level address of this_frame parameter passed to get_prev_frame function:

sv51

At WinDBG session, at last we have:

sv52

The eax register value contains returned architecture (gdbarch) value. The above address is actually part of frame_unwind_arch function:

sv53

The above eax register is a call to sentinel_frame_prev_arch function which in turn calls to get_regcache_arch function and we have the partial holy grail of the origin of gdbarch as follow:

sv54

Let’s examine gdbarch properties at this stage:

sv55

To find how regcache is initialized, it is necessary to examine the creation of sentinel_frame structure, which is initialized inside get_current_frame (frame.c) function. This function accepts one of interesting parameter called regcache structure, which the structure we sought for to find the gdbarch structure. And this function is called using get_current_regcache (regcache.c) to retrieve current regcache structure.

After performing more detailed tracing of these calls, it is revealed that the gdbarch is assigned from static structure called current_thread_arch. Close examination of this structure revealed it contains arm_breakpoint that’s not recognized by my version of Android Emulator.

So, the routine of memory write tracking is again repeated, this time against initialization of current_thread_arch structure:

sv56

This time, we have:

sv57

Which is:

sv58

The call to 0x7e127c is actually a call to target dependent to retrieve the current architecture and the current_thread_arch is assigned from current_inferior_->gdbarch.

Where current_inferior_ is value is initialized at:

sv59

Which is:

sv60

This is the low level view of current_inferior_ structure at get_regcache_arch call:

sv61

The above structure will be used to find when the gdbarch is assigned at the time after the call to the above initialize_inferiors function:

sv62

Now that gdbarch initialization is found, the next step is to find arm_breakpoint assignment inside tdep structure. To achieve this, it is necessary to identify low level of tdep structure:

sv63

The function just before the arm_breakpoint initialization occurs is located inside arm_gdbarch_init:

sv64

Initialization itself is located at:

sv65

Call stack leading to the above initialization:

sv66

The initialization is using the arm_linux_arm_le_breakpoint because:

sv67

This assignment is dependent on tdep->arm_abi which is assigned to ARM_ABI_AUTO instead of ARM_ABI_AAPCS which contains working arm_breakpoint:

sv68

Based on these facts, it is possible to change the arm_breakpoint value to the working one, provided that I have to set the arm_abi assignment from ARM_ABI_AUTO to ARM_ABI_AAPCS. The arm_abi inside arm_gdbarch_init is first assigned to static variable called arm_abi_global. So, does the gdb client will deliver the working arm_breakpoint when I revise this static variable to ARM_ABI_AAPCS ?

To prove this, let’s perform setting the arm abi option to AAPCS (case sensitive) in the gdb debugger client, but you have to set it prior to any subsequent debugging command as follow:

sv69

So far so good ๐Ÿ™‚ But there are still some SIGILL crops up. Perhaps, it is caused by thumb arm breakpoint for arm thumb mode, another bugs or another unknown options to be set from gdb client.

In this case, I will leave to you, the reader of this article to solve the problem. I believe you can do it ๐Ÿ™‚

You may ask why creates such a long story if in the end, the solution is just a simple setting some option in the debugger ? It is because I do not find any working solution to this issue after some research in the net. They only offer some vague solutions that do not provide the root cause of this issue.

The only options that I have is to trace the causal chain using available building blocks that is also has to be created one by one, such as performing the compilation of gdb server and client sources.

I think the Android platform investor, decision maker and executives should consider to provide a serious, reliable and proven support ecosystem just like the one provided by Microsoft.

Without it, this platform will lack the required condition to ignite the chain reaction for an exploding ecosystem, just like a failed star such as Jupiter against the Sun ๐Ÿ™‚

Advertisements

One Response to “Solving SIGILL in Android Emulator Debugging”

  1. Frederic Says:

    hi i am not familiar with this low level sorcellery and i must admint that yesterday i was very close to fix this SIGILL with my hammer, That was the situation before going to bed, and in bed googling come me to your site, a god ray like in the movies, appears ! I will going back to my desk, test show arm and set arm aarrcs and it work ! No more SIGILL, i was so happy, i have then spend a (very) good nigth. You save my (computer) life. I will defnititly bookmark your blog ! Thank you !

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: