Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGSEGV when using QemuForkExecutor in "arm" feature, and Unknown error: Unix error: ECHILD #2632

Open
RongxiYe opened this issue Oct 25, 2024 · 6 comments
Labels
qemu LibAFL QEMU

Comments

@RongxiYe
Copy link

The issue to be present in the current main branch

$ git log | head -n 1
commit dfd5609c10da85f32e0dec74a72a432acd85310a

Describe the issue
I am doing some fuzzing practice using an tenda VC 15 router httpd, which is 32-bit arm architecture. I use a QemuForkExecutor, but got an error when load the initial inputs:

Failed to load initial corpus at ["./seed/"]

I print the error,

if state.must_load_initial_inputs() {
    state
        .load_initial_inputs(&mut fuzzer, &mut executor, &mut mgr, &intial_dirs)
        .unwrap_or_else(|a| {
             println!("{}", a);
             println!("Failed to load initial corpus at {:?}", &intial_dirs);
             process::exit(0);
         });
    println!("We imported {} inputs from disk.", state.corpus().count());
 }

and it says:

Unknown error: Unix error: ECHILD

I debug the fuzzer, and find out that the fuzzer receives a SIGSEGV in trace_edge_hitcount_ptr:

   715 pub unsafe extern "C" fn trace_edge_hitcount_ptr(_: *const (), id: u64) {
   716     unsafe {
   717         let ptr = LIBAFL_QEMU_EDGES_MAP_PTR.add(id as usize);718         *ptr = (*ptr).wrapping_add(1);
   719     }
   720 }
   
pwndbg> p ptr
$1 = (*mut u8) 0x4d55bbdb022cd456
pwndbg> p *ptr
Cannot access memory at address 0x4d55bbdb022cd456

It seems that the value of ptr cannot be dereferenced. I know that this function is used to record the coverage, but I don't know what "id" or "ptr" mean. So I read the related instrumentation code in qemu-libafl-bridge.

//$ git log | head -n 1
//commit 805b14ffc44999952562e8f219d81c21a4fa50b9

// in accel/tcg/cpu_exec.c, cpu_exec_loop
//// --- Begin LibAFL code ---

            bool libafl_edge_generated = false;
            TranslationBlock *edge;

            /* See if we can patch the calling TB. */
            if (last_tb) {
                // tb_add_jump(last_tb, tb_exit, tb);

                if (last_tb->jmp_reset_offset[1] != TB_JMP_OFFSET_INVALID) {
                    mmap_lock();
                    edge = libafl_gen_edge(cpu, last_tb->pc, pc, tb_exit, cs_base, flags, cflags);
                    mmap_unlock();

                    if (edge) {
                        tb_add_jump(last_tb, tb_exit, edge);
                        tb_add_jump(edge, 0, tb);
                        libafl_edge_generated = true;
                    } else {
                        tb_add_jump(last_tb, tb_exit, tb);
                    }
                } else {
                    tb_add_jump(last_tb, tb_exit, tb);
                }
            }

            if (libafl_edge_generated) {
                // execute the edge to make sure to log it the first execution
                // the edge will then jump to the translated block
                cpu_loop_exec_tb(cpu, edge, pc, &last_tb, &tb_exit);
            } else {
                cpu_loop_exec_tb(cpu, tb, pc, &last_tb, &tb_exit);
            }

            //// --- End LibAFL code ---

My understanding is: if a new translation block is generated by libafl_gen_edge, it is executed first, and then it is recorded on the coverage graph by jumping to trace_edge_hitcount_ptr through the hook. (I use StdEdgeCoverageChildModule, and I remember it used the edge type hook.)
Also, I debugged this part of codes. Considering the contents of the TranslationBlock structure, I found the specific contents of the edge variable:

// edge->tc.ptr
pwndbg> p/x *itb
$7 = {
  pc = 0x40a23030,
  cs_base = 0x480,
  flags = 0x0,
  cflags = 0x800010,
  size = 0x1,
  icount = 0x1,
  tc = {
    ptr = 0x710ee4e00740,
    size = 0x38
  },
  itree = {
    rb = {
      rb_parent_color = 0xfec7058d4840804b,
      rb_right = 0x48fffff959e9ffff,
      rb_left = 0x4de9fffffebd058d
    },
    start = 0x40a23030,
    last = 0xffffffffffffffff,
    subtree_last = 0x0
  },
  jmp_lock = {
    value = 0x0
  },
  jmp_reset_offset = {0x20, 0xffff},
  jmp_insn_offset = {0x1c, 0xffff},
  jmp_target_addr = {0x710ee4e00500, 0x0},
  jmp_list_head = 0x710ee4e002c0,
  jmp_list_next = {0x0, 0x0},
  jmp_dest = {0x710ee4e00440, 0x0}
}

pwndbg> x/16x 0x710ee4e00740
0x710ee4e00740 <code_gen_buffer+1811>:  0x3456be48      0x43f7dbc5      0xbf484d55      0x7f076fa0

Note the value of tc.ptr here. It is <code_gen_buffer+1811>. The machine code it points to is 0x43f7dbc53456be48, and gdb told me it means movabs rsi, 0x4d5543f7dbc53456.
While tracing the code flow later, I found that the fuzzer jumped to a small section of code hook to prepare parameters(moving to rdi and rsi), and then jumped to trace_edge_hitcount_ptr.

   0x5a457a1901df <cpu_exec_loop.isra+783>    mov    r12, qword ptr [r8 + 0x20]
   0x5a457a1901e3 <cpu_exec_loop.isra+787>    test   eax, 0x120
   0x5a457a1901e8 <cpu_exec_loop.isra+792>    jne    cpu_exec_loop.isra+1720     <cpu_exec_loop.isra+1720>
   0x5a457a1901ee <cpu_exec_loop.isra+798>    lea    rax, [rip + 0x3d2c0cb]         RAX => 0x5a457debc2c0 (tcg_qemu_tb_exec) —▸ 0x710ee4e00000 ◂— push rbp /* 0x5641554154415355 */

// R12 is 0x710ee4e00740 (code_gen_buffer+1811) ◂— movabs rsi, 0x4d5543f7dbc53456 /* 0x43f7dbc53456be48 */

   0x710ee4e00000                           push   rbp
   0x710ee4e00001                           push   rbx
   0x710ee4e00002                           push   r12
   0x710ee4e00004                           push   r13
   0x710ee4e00006                           push   r14
   0x710ee4e00008                           push   r15
   0x710ee4e0000a                           mov    rbp, rdi        RBP => 0x5a457f038920 ◂— 0x123fb400000000
   0x710ee4e0000d                           add    rsp, -0x488     RSP => 0x7ffffcc93560 (0x7ffffcc939e8 + -0x488)
   0x710ee4e00014                           jmp    rsi                         <code_gen_buffer+1811>
    ↓   
   0x710ee4e00740 <code_gen_buffer+1811>    movabs rsi, 0x4d5543f7dbc53456     RSI => 0x4d5543f7dbc53456
   0x710ee4e0074a <code_gen_buffer+1821>    movabs rdi, 0x5a457f076fa0         RDI => 0x5a457f076fa0 ◂— 0
►  0x710ee4e00754 <code_gen_buffer+1831>    call   qword ptr [rip + 0x16]      <libafl_qemu::modules::edges::trace_edge_hitcount_ptr>
        rdi: 0x5a457f076fa0 ◂— 0
        rsi: 0x4d5543f7dbc53456

This seems to indicate that the number following movabs rsi, will become the id. But the values ​​I have here don't look right.

My issues now are as follows:

  1. What does id actually represent?
  2. How is it calculated?
  3. How can I solve this problem?
  4. Do I need to provide any additional information?

Thank you very much!

@RongxiYe
Copy link
Author

Hi, I already know that id is generated through libafl_qemu_hook_edge_gen->create_gen_wrapper->gen_hashed_edge_ids(in StdEdgeCoverageChildModule). Now I am debugging this part of code...

@RongxiYe
Copy link
Author

I found the process of calculating id and the intermediate value. The calculated id is indeed 0x4d5543f7dbc53456. Do you think there is a problem?

// src is 0x40a23030, dest is 0x40a23058
*RAX  0x4a265dc83567e8c8 hash_me(src)
*RAX  0x7731e3feea2dc9e  hash_me(dest)

 ► 0x5af412b8763e <libafl_qemu::modules::edges::gen_hashed_edge_ids+174>    xor    rax, rcx                        RAX => 0x4d5543f7dbc53456 (0x4a265dc83567e8c8 ^ 0x7731e3feea2dc9e)

Considering that LIBAFL_QEMU_EDGES_MAP_PTR is 0x761cc2856000, maybe it causes the SIGSEGV because it exceeds its range after the addition?

   715 pub unsafe extern "C" fn trace_edge_hitcount_ptr(_: *const (), id: u64) {
   716     unsafe {
   717         let ptr = LIBAFL_QEMU_EDGES_MAP_PTR.add(id as usize);
 ► 718         *ptr = (*ptr).wrapping_add(1);
   719     }
   720 }
$25 = 0x4d5543f7dbc53456
pwndbg> p/x LIBAFL_QEMU_EDGES_MAP_PTR
$26 = 0x761cc2856000

@RongxiYe RongxiYe changed the title SIGSEGV when using QemuForkExecutor in "arm" feature SIGSEGV when using QemuForkExecutor in "arm" feature, and Unknown error: Unix error: ECHILD Oct 28, 2024
@RongxiYe
Copy link
Author

I tried to change id as usize to (id as u16).try_into().unwrap(). This part is fine for now. (This is just a temporary solution.) But when I continued, the error Unknown error: Unix error: ECHILD still occurred. This seems to be because in the run_target method of GenericInProcessForkExecutorInner, the parent process does not correctly capture the exit of the child process. I will debug further.

@RongxiYe
Copy link
Author

I found that in parent method of GenericInProcessForkExecutorInner, waitpid returned this error. This error, Unknown error: Unix error: ECHILD, means there is no child process.

pub(super) fn parent(&mut self, child: Pid) -> Result<ExitKind, Error> {
    let res = waitpid(child, None)?;
    //...
}

The waitpid in nix calls libc::waitpid. I searched on search engines and only this question seems to be somewhat related.
The first fork happens in SimpleRestartingEventManager::launch. The second fork happens in the run_target function of the Executor called inside load_initial_inputs.

pub fn launch(mut monitor: MT, shmem_provider: &mut SP) -> Result<(Option<S>, Self), Error>
    where
        S: DeserializeOwned + Serialize + HasCorpus + HasSolutions,
        MT: Debug,
    {
            // Client->parent loop
            loop {
                log::info!("Spawning next client (id {ctr})");

                // On Unix, we fork
                #[cfg(all(unix, feature = "fork"))]
                let child_status = {
                    shmem_provider.pre_fork()?;
                    match unsafe { fork() }? {
                        ForkResult::Parent(handle) => {
                            unsafe {
                                libc::signal(libc::SIGINT, libc::SIG_IGN);
                            }
                            shmem_provider.post_fork(false)?;
                            handle.status()
                        }
                        ForkResult::Child => {
                            shmem_provider.post_fork(true)?;
                            break staterestorer;
                        }
                    }
                };
}

fn run_target() -> Result<ExitKind, Error> {
        *state.executions_mut() += 1;
        unsafe {
            self.inner.shmem_provider.pre_fork()?;
            match fork() {
                Ok(ForkResult::Child) => {
                    // Child
                    self.inner.pre_run_target_child(fuzzer, state, mgr, input)?;
                    (self.harness_fn)(input, &mut self.exposed_executor_state);
                    self.inner.post_run_target_child(fuzzer, state, mgr, input);
                    Ok(ExitKind::Ok)
                }
                Ok(ForkResult::Parent { child }) => {
                    // Parent
                    self.inner.parent(child)
                }
                Err(e) => Err(Error::from(e)),
            }
        }
    }

I am still somewhat confused as to why this happened...

@rmalmain
Copy link
Collaborator

thank you for the detailed report.

I found the process of calculating id and the intermediate value. The calculated id is indeed 0x4d5543f7dbc53456. Do you think there is a problem?

// src is 0x40a23030, dest is 0x40a23058
*RAX  0x4a265dc83567e8c8 hash_me(src)
*RAX  0x7731e3feea2dc9e  hash_me(dest)

 ► 0x5af412b8763e <libafl_qemu::modules::edges::gen_hashed_edge_ids+174>    xor    rax, rcx                        RAX => 0x4d5543f7dbc53456 (0x4a265dc83567e8c8 ^ 0x7731e3feea2dc9e)

Considering that LIBAFL_QEMU_EDGES_MAP_PTR is 0x761cc2856000, maybe it causes the SIGSEGV because it exceeds its range after the addition?

   715 pub unsafe extern "C" fn trace_edge_hitcount_ptr(_: *const (), id: u64) {
   716     unsafe {
   717         let ptr = LIBAFL_QEMU_EDGES_MAP_PTR.add(id as usize);
 ► 718         *ptr = (*ptr).wrapping_add(1);
   719     }
   720 }
$25 = 0x4d5543f7dbc53456
pwndbg> p/x LIBAFL_QEMU_EDGES_MAP_PTR
$26 = 0x761cc2856000

about the first bug, could you print the value of LIBAFL_QEMU_EDGES_MAP_MASK_MAX once in the gen_hashed_edge_ids function?

I found that in parent method of GenericInProcessForkExecutorInner, waitpid returned this error. This error, Unknown error: Unix error: ECHILD, means there is no child process.

pub(super) fn parent(&mut self, child: Pid) -> Result<ExitKind, Error> {
    let res = waitpid(child, None)?;
    //...
}

The waitpid in nix calls libc::waitpid. I searched on search engines and only this question seems to be somewhat related. The first fork happens in SimpleRestartingEventManager::launch. The second fork happens in the run_target function of the Executor called inside load_initial_inputs.

pub fn launch(mut monitor: MT, shmem_provider: &mut SP) -> Result<(Option<S>, Self), Error>
    where
        S: DeserializeOwned + Serialize + HasCorpus + HasSolutions,
        MT: Debug,
    {
            // Client->parent loop
            loop {
                log::info!("Spawning next client (id {ctr})");

                // On Unix, we fork
                #[cfg(all(unix, feature = "fork"))]
                let child_status = {
                    shmem_provider.pre_fork()?;
                    match unsafe { fork() }? {
                        ForkResult::Parent(handle) => {
                            unsafe {
                                libc::signal(libc::SIGINT, libc::SIG_IGN);
                            }
                            shmem_provider.post_fork(false)?;
                            handle.status()
                        }
                        ForkResult::Child => {
                            shmem_provider.post_fork(true)?;
                            break staterestorer;
                        }
                    }
                };
}

fn run_target() -> Result<ExitKind, Error> {
        *state.executions_mut() += 1;
        unsafe {
            self.inner.shmem_provider.pre_fork()?;
            match fork() {
                Ok(ForkResult::Child) => {
                    // Child
                    self.inner.pre_run_target_child(fuzzer, state, mgr, input)?;
                    (self.harness_fn)(input, &mut self.exposed_executor_state);
                    self.inner.post_run_target_child(fuzzer, state, mgr, input);
                    Ok(ExitKind::Ok)
                }
                Ok(ForkResult::Parent { child }) => {
                    // Parent
                    self.inner.parent(child)
                }
                Err(e) => Err(Error::from(e)),
            }
        }
    }

I am still somewhat confused as to why this happened...

maybe it's about the order in which processes die?

@RongxiYe
Copy link
Author

RongxiYe commented Oct 29, 2024

@rmalmain Thank you for your reply.
For the first question, my LIBAFL_QEMU_EDGES_MAP_MASK_MAX is 0xffff.

About the second one, I write this fuzzer on an httpd program. There are many places in the program that use fork to process commands. Maybe I should find a more appropriate way to write the fuzzer.

@tokatoka tokatoka added the qemu LibAFL QEMU label Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
qemu LibAFL QEMU
Projects
None yet
Development

No branches or pull requests

3 participants