- Unicorn is a lightweight multi-platform, multi-architecture CPU emulator framework.
- Uses a heavily modified version of Qemu to support multiple architectures.
- Authors N.GUYEN Anh Quyn and DANG Hoang Vu.
- Official Site [link]
- Introduction BlackHat [link]
- Source Code [link]
- x86 Constant [link]
unicorn.h
[link]unicorn.py
[link]- Eternal Stories' Unicorn Engine tutorial [link]
- Tomori Nao's Unicorn Engine Reference (Unofficial) [link]
The following is a list of high-level usage and core concepts of using the Unicorn Engine.
- Initialize Unicorn Instance.
- Read and write memory.
- Read and write registers.
- Start and stop emulation.
- Memory and hook management.
- Instrument user-defined callbacks.
- Exit Unicorn Instance
Each usage and concept is explored below with brief descriptions and example code.
To initialize the Unicorn class the API Uc(UC_ARCH, UC_MODE)
is used. The first argument is the hardware architecture type. The second argument is the hardware mode type and/or endianness. The following are the available architecture type.
UC_ARCH_ARM
// ARM architecture (including Thumb, Thumb-2)UC_ARCH_ARM64
// ARM-64, also called AArch64UC_ARCH_MIPS
// Mips architectureUC_ARCH_X86
// X86 architecture (including x86 & x86-64)UC_ARCH_PPC
// PowerPC architecture (currently unsupported)UC_ARCH_SPARC
// Sparc architectureUC_ARCH_M68K
// M68K architectureUC_ARCH_MAX
The following is the available hardware types. The comments are from unicorn.h
.
UC_MODE_LITTLE_ENDIAN
// little-endian mode (default mode)UC_MODE_BIG_ENDIAN
// big-endian mode
UC_MODE_ARM
// ARM modeUC_MODE_THUMB
// THUMB mode (including Thumb-2)UC_MODE_MCLASS
// ARM's Cortex-M series (currently unsupported)UC_MODE_V8
// ARMv8 A32 encodings for ARM (currently unsupported)
UC_MODE_MICRO
// MicroMips mode (currently unsupported)UC_MODE_MIPS3
// Mips III ISA (currently unsupported)UC_MODE_MIPS32R6
// Mips32r6 ISA (currently unsupported)UC_MODE_MIPS32
// Mips32 ISAUC_MODE_MIPS64
// Mips64 ISA
UC_MODE_16
// 16-bit modeUC_MODE_32
// 32-bit modeUC_MODE_64
// 64-bit mode
UC_MODE_PPC32
// 32-bit mode (currently unsupported)UC_MODE_PPC64
// 64-bit mode (currently unsupported)UC_MODE_QPX
// Quad Processing eXtensions mode (currently unsupported)
UC_MODE_SPARC32
// 32-bit modeUC_MODE_SPARC64
// 64-bit modeUC_MODE_V9
// SparcV9 mode (currently unsupported)
There are many different combinations of architecture and hardware types. The Unicorn Engine Python bindings directory [link] contains several example scripts. All the examples have the pattern sample_.*.py
.
The following example uses pefile to get the hardware type for an x86
architecture type.
from __future__ import print_function
import pefile
from unicorn import *
from unicorn.x86_const import *
IMAGE_FILE_MACHINE_I386 = 0x014c
IMAGE_FILE_MACHINE_AMD64 = 0x8664
def read_data(file_path):
with open(file_path, "rb") as infile:
return infile.read()
def is_x86_machine(pe_data):
"""
Is executable 32 bit
:param pe_data:
:return: True if 32 bit, False if 64 bit, None if other
"""
try:
pe = pefile.PE(data=pe_data)
except Exception as e:
print("ERROR: pefile load error %s") % (e)
return
if pe.FILE_HEADER.Machine == IMAGE_FILE_MACHINE_I386:
x86_machine = True
elif pe.FILE_HEADER.Machine == IMAGE_FILE_MACHINE_AMD64:
x86_machine = False
return x86_machine
pe_data = read_data("data.bin")
if is_x86_machine(pe_data):
# Initialize unicorn instance
uc = Uc(UC_ARCH_X86, UC_MODE_32)
print("32-bit")
else:
# UC_MODE_LITTLE_ENDIAN is the default mode
# UC_MODE_LITTLE_ENDIAN has been included to show usage of defining endianness
uc = Uc(UC_ARCH_X86, UC_MODE_64 | UC_MODE_LITTLE_ENDIAN)
print("64-bit")
uc
is the unicorn instance.
Before memory can read to or write to, the memory needs to be mapped. To map memory the APIs uc.mem_map(address, size, perms=uc.UC_PROT_ALL)
and uc.mem_map_ptr(address, size, perms, ptr)
are used. The following memory protections are available.
UC_PROT_NONE
UC_PROT_READ
UC_PROT_WRITE
UC_PROT_EXEC
UC_PROT_ALL
To protect a range of memory the API uc.mem_protect(address, size, perms=uc.UC_PROT_ALL)
is used. To unmap memory the API uc.mem_unmap(address, size)
is used. Once the memory is mapped it can be written to by calling uc.mem_write(address, data)
. To read from the allocated memory uc.mem_read(address, size)
is used.
The following code creates a Unicorn Instance assigned to uc
. An address of 0x00300000
is assigned to stack_base
and a size of 0x00100000
is assigned to stack_size
. The address is then mapped by calling uc.mem_map(stack_base, stack_size)
. Once the memory is mapped null bytes (\x00
) are written to the memory.
uc = Uc(UC_ARCH_X86, UC_MODE_32)
stack_base = 0x00300000
stack_size = 0x00100000
uc.mem_map(stack_base, stack_size)
uc.mem_write(stack_base, "\x00" * stack_size)
Registers can be read by calling uc.reg_read(reg_id, opt=None)
. The reg_id
is defined in the appropriate architecture constant Python file in the Python bindings directory [link].
ARM-64
inarm64_const.py
ARM
inarm_const.py
M68K
inm68k_const.py
MIPS
inmips_const.py
SPARC
insparc_const.py
X86
inx86_const.py
To reference the constants, they must be first imported. In Example 1 the constants are imported by calling the following code.
from unicorn.x86_const import *
To write the contents of a register uc.reg_write(reg_id, value)
is used.
The following code writes the stack base address stored in stack_base
to the register ESP
by calling uc.reg_write(reg_id, value)
. UC_X86_REG_ESP
is the reg_id
for ESP. The register is then read by calling uc.reg_read(reg_id, opt=None)
. The last argument in reg_read
is optional and is left out in the example code.
uc.reg_write(UC_X86_REG_ESP, stack_base + stack_size)
ebp = uc.reg_read(UC_X86_REG_EBP)
print("0x%x" % ebp)
To start the Unicorn Engine emulating the API uc.emu_start(begin, until, timeout=0, count=0)
is called. The first function start
is the first address that is emulated. The second argument until
is the address (or above) that the Unicorn Engine stops emulating at. The argument timeout=
is used to define the number of milliseconds that the Unicorn Engine executes until it times out. UC_SECOND_SCALE * n
can be used to wait n
number of seconds. The last argument count=
can be used to define the number of instructions that are executed before the Unicorn Engine stops executing. If count=
is zero or less than counting by the Unicorn Engine is disabled. To stop emulating the API uc.emu_stop()
is used.
Building off of the previous examples, the following code reads the Portable Executable (PE) entry point and then executes until the instruction pointer EIP
is either 100 bytes away or past the address the entry point. If the Unicorn Engine hits an exception it is printed.
entry_point = pe.OPTIONAL_HEADER.ImageBase + pe.OPTIONAL_HEADER.AddressOfEntryPoint
try:
uc.emu_start(entry_point, entry_point + 100)
except Exception as e:
print(e)
The Unicorn Engine supports a wide arrange of hooks. The following describes each hook. The sentences in quotes were extracted from comments in unicorn.h
. The hooks are inserted before the call to start the emulation. To add a hook the API mu.hook_add(UC_HOOK_*, callback, user_data, begin, end, ...)
. The first two arguments are mandatory. The last three are optional and are usually populated with default values of None, 1, 0, UC_INS
. To delete a hook the API emu.hook_del(hook)
is used. To delete a hook it must be assigned to a variable. Please see the following code for an example.
i = emu.hook_add(UC_HOOK_CODE, hook_code, None)
emu.hook_del(i)
The following covers each hook type with a minimal code example.
- "Hook all interrupt/syscall events"
intno
is the interrupt number.user_data
is user data supplied to the callback
def hook_intr(uc, intno, user_data):
# only handle Linux syscall
if intno != 0x80:
print("got interrupt %x ???" %intno);
uc.emu_stop()
return
mu.hook_add(UC_HOOK_INTR, hook_intr)
- "Hook a particular instruction - only a very small subset of instructions supported here"
- Hooks three x86 instructions:
IN
,OUT
andSYSCALL
[?].
def hook_syscall(mu, user_data):
rax = mu.reg_read(UC_X86_REG_RAX)
if rax == 0x100:
mu.reg_write(UC_X86_REG_RAX, 0x200)
else:
print('ERROR: was not expecting rax=%d in syscall' % rax)
mu.hook_add(UC_HOOK_INSN, hook_syscall, None, 1, 0, UC_X86_INS_SYSCALL)
- "Hook a range of code"
- Hook is called before every instruction is executed.
def hook_code(uc, address, size, user_data):
print('>>> Tracing instruction at 0x%x, instruction size = 0x%x' %(address, size))
uc.hook_add(UC_HOOK_CODE, hook_code)
- "Hook basic blocks"
def hook_block(uc, address, size, user_data):
print(">>> Tracing basic block at 0x%x, block size = 0x%x" %(address, size)
uc.hook_add(UC_HOOK_BLOCK, hook_block)
The rest of the hooks are related to the reading, fetching, writing and accessing of memory. They all start with UC_HOOK_MEM_*
. Their callback all have the same arguments as seen below.
def hook_mem_example(uc, access, address, size, value, user_data):
pass
The second argument is access
UC_MEM_READ = 16
UC_MEM_WRITE = 17
UC_MEM_FETCH = 18
UC_MEM_READ_UNMAPPED = 19
UC_MEM_WRITE_UNMAPPED = 20
UC_MEM_FETCH_UNMAPPED = 21
UC_MEM_WRITE_PROT = 22
UC_MEM_READ_PROT = 23
UC_MEM_FETCH_PROT = 24
UC_MEM_READ_AFTER = 25
- "Hook for memory read on unmapped memory"
def hook_mem_read_unmapped(uc, access, address, size, value, user_data):
pass
uc.hook_add(UC_HOOK_MEM_READ_UNMAPPED, hook_mem_read_unmapped, None)
Note The following hooks do not contain example code but the previous examples can be used as a skelton
- "Hook for invalid memory write events"
- "Hook for invalid memory fetch for execution events"
- "Hook for memory read on read-protected memory"
- "Hook for memory write on write-protected memory"
- "Hook for memory fetch on non-executable memory"
- "Hook memory read events"
- "Hook memory write events"
- "Hook memory fetch for execution events"
- "Hook memory read events, but only successful access. he callback will be triggered after successful read"
Instrument callbacks are used to read, write or control the flow of the instrumentation. The Memory and Hook Management section contained a number of example user callbacks.
The following class can be used to initialize memory (code, data, stack, etc) before using Unicorn. It is designed for the x86/x86_64 code or executables. It relies on PEFile to load the executable. An example on how to use the code can be found in Usage Example.
from __future__ import print_function
import pefile, collections
from unicorn import *
from unicorn.x86_const import *
IMAGE_FILE_MACHINE_I386 = 0x014c
IMAGE_FILE_MACHINE_AMD64 = 0x8664
"""
Usage Example:
# run shellcode
# read bytes
instructions = b"\x90\x90\x90\x90""
# init unicorn instance, allocate memory, create stack, etc
tt = InitUnicorn(instructions)
# emulate code standard unicorn start arguments
tt.mu.emu_start(tt.code_base, tt.code_base + len(instructions))
# read register value
r_eip = tt.mu.reg_read(UC_X86_REG_EIP)
print("0x%x" % r_eip)
# read 32-bit executable
data = open("bad_file.exe", "rb").read()
# options data, type_pe=False, bit=32, debug=False)
tt = InitUnicorn(data, type_pe=True, debug=True)
try:
# Verbose, if it works set DEBUG False
tt.DEBUG = True
# emulate virtual address
tt.mu.emu_start(0x0405490, 0x0405490 + 0x1C)
r_eax = tt.mu.reg_read(UC_X86_REG_EAX)
print("0x%x" % r_eax)
except Exception as e:
print("0x%x" % (tt.mu.reg_read(UC_X86_REG_ESP)))
print(e)
"""
class InitUnicorn(object):
def __init__(self, data, type_pe=False, bit=32, debug=False):
self.code_base = 0x00100000
self.DEBUG = debug
# pe check
if type_pe:
self.load_pe(data)
if self.pe:
self.init_unicorn()
self.base = self.pe.OPTIONAL_HEADER.ImageBase
self.map_pe_mem()
self.create_stack()
else:
if bit == 32:
self.is_x86_machine = True
else:
self.is_x86_machine = False
self.init_unicorn()
self.map_data(data)
self.create_stack()
if self.DEBUG:
self.add_debug()
def create_stack(self):
if self.is_x86_machine:
self.is_x86_machine = True
self.stack_base = 0x00300000
self.stack_size = 0x00100000
else:
self.is_x86_machine = False
self.stack_base = 0xffffffff00000000
self.stack_size = 0x0000000000100000
self.mu.mem_map(self.stack_base, self.stack_size)
self.mu.mem_write(self.stack_base, b"\x00" * self.stack_size)
if self.is_x86_machine:
self.mu.reg_write(UC_X86_REG_ESP, self.stack_base + 0x800)
self.mu.reg_write(UC_X86_REG_EBP, self.stack_base + 0x1000)
else:
self.mu.reg_write(UC_X86_REG_RSP, self.stack_base + 0x8000)
self.mu.reg_write(UC_X86_REG_RBP, self.stack_base + 0x10000)
def init_unicorn(self):
if self.is_x86_machine:
self.mu = Uc(UC_ARCH_X86, UC_MODE_32)
else:
self.mu = Uc(UC_ARCH_X86, UC_MODE_64)
def map_data(self, data):
self.mu.mem_map(self.code_base, 0x10000)
self.mu.mem_write(self.code_base, data)
'''
Modified version of Willi Ballenthin's code
https://github.com/williballenthin/python-vb/blob/master/vb/analyzer.py#L115
'''
def load_pe(self, pe_data):
try:
self.pe = pefile.PE(data=pe_data)
if self.pe.FILE_HEADER.Machine == IMAGE_FILE_MACHINE_I386:
self.is_x86_machine = True
elif self.pe.FILE_HEADER.Machine == IMAGE_FILE_MACHINE_AMD64:
self.is_x86_machine = False
except Exception as e:
self.pe = None
if self.DEBUG:
print("ERROR: pefile load error %s") % (e)
return
def map_pe_mem(self):
# map executable memory
align_size = self.pe.OPTIONAL_HEADER.SectionAlignment
for section in self.get_map():
self.mu.mem_map(section.va, self.align(section.size, align_size))
temp_bytes = self.get_bytes(section.va, section.size)
self.mu.mem_write(section.va, temp_bytes)
def get_map(self):
MapEntry = collections.namedtuple('MapEntry', ['va', 'size'])
ret = []
for section in self.pe.sections:
rva = section.VirtualAddress
va = self.base + rva
size = section.Misc_VirtualSize
ret.append(MapEntry(va, size))
return ret
def get_bytes(self, va, length):
rva = va - self.base
return self.pe.get_data(rva, length)
def va(self, rva):
return self.base + rva
def rva(self, va):
return va - self.base
def align(self, value, alignment):
if value % alignment == 0:
return value
return value + (alignment - (value % alignment))
def add_debug(self):
"""For Debugging Use Only"""
self.mu.hook_add(UC_HOOK_CODE, self.hook_code)
self.mu.hook_add(UC_HOOK_INSN, self.hook_call, None, 1, 0, UC_X86_INS_CALL)
self.mu.hook_add(UC_HOOK_MEM_INVALID, self.hook_mem_invalid)
def hook_code(self, uc, address, size, user_data):
"""For Debugging Use Only"""
print('>>> Tracing instruction at 0x%x, instruction size = 0x%x' % (address, size))
def hook_call(self, uc, address, size, user_data):
"""For Debugging Use Only"""
print('>>> Call instruction at 0x%x, instruction size = 0x%x' % (address, size))
def hook_mem_invalid(self, uc, access, address, size, value, user_data):
"""For Debugging Use Only"""
eip = uc.reg_read(UC_X86_REG_EIP)
if access == UC_MEM_WRITE:
print("invalid WRITE of 0x%x at 0x%X, data size = %u, data value = 0x%x" % (address, eip, size, value))
if access == UC_MEM_READ:
print("invalid READ of 0x%x at 0x%X, data size = %u" % (address, eip, size))
if access == UC_MEM_FETCH:
print("UC_MEM_FETCH of 0x%x at 0x%X, data size = %u" % (address, eip, size))
if access == UC_MEM_READ_UNMAPPED:
print("UC_MEM_READ_UNMAPPED of 0x%x at 0x%X, data size = %u" % (address, eip, size))
if access == UC_MEM_WRITE_UNMAPPED:
print("UC_MEM_WRITE_UNMAPPED of 0x%x at 0x%X, data size = %u" % (address, eip, size))
if access == UC_MEM_FETCH_UNMAPPED:
print("UC_MEM_FETCH_UNMAPPED of 0x%x at 0x%X, data size = %u" % (address, eip, size))
if access == UC_MEM_WRITE_PROT:
print("UC_MEM_WRITE_PROT of 0x%x at 0x%X, data size = %u" % (address, eip, size))
if access == UC_MEM_FETCH_PROT:
print("UC_MEM_FETCH_PROT of 0x%x at 0x%X, data size = %u" % (address, eip, size))
if access == UC_MEM_FETCH_PROT:
print("UC_MEM_FETCH_PROT of 0x%x at 0x%X, data size = %u" % (address, eip, size))
if access == UC_MEM_READ_AFTER:
print("UC_MEM_READ_AFTER of 0x%x at 0x%X, data size = %u" % (address, eip, size))
return False