A TCG backend that chains together JOP/ROP-ish gadgets to massively reduce interpreter overhead vs TCI. Platform-dependent; but usable when JIT isn't available; e.g. on platforms that lack WX mappings. The general idea squish the addresses of a gadget sequence into a "queue" and then write each gadget so it ends in a "dequeue-jump".
Execution occurs by jumping into the first gadget, and letting it just play back some linear-overhead native code sequences for a while.
Since TCG-TCI is optimized for sets of 16 GP registers and aarch64 has 30, we could easily keep JIT/QEMU and guest state separate, and since 16*16 is reasonably small we could actually have a set of reasonable gadgets for each combination of operands.
Regs | Use |
---|---|
x1-x15 | Guest Registers |
x24 | TCTI temporary |
x25 | saved IP during call |
x26 | TCTI temporary |
x27 | TCTI temporary |
x28 | Thread-stream pointer |
x30 | Link register |
SP | Stack Pointer, host |
PC | Program Counter, host |
In pseudocode:
Symbol | Meaning |
---|---|
Rd | stand-in for destination register |
Rn | stand-in for first source register |
Rm | stand-in for second source register |
Each gadget ends by advancing our bytecode pointer, and then executing from thew new location.
# Load our next gadget address from our bytecode stream, advancing it, and jump to the next gadget.
ldr x27, [x28], #8\n
br x27
When calling into C, we lose control over which registers are used. Accordingly, we'll need to save registers relevant to TCTI:
str x25, [sp, #-16]!
stp x14, x15, [sp, #-16]!
stp x12, x13, [sp, #-16]!
stp x10, x11, [sp, #-16]!
stp x8, x9, [sp, #-16]!
stp x6, x7, [sp, #-16]!
stp x4, x5, [sp, #-16]!
stp x2, x3, [sp, #-16]!
stp x0, x1, [sp, #-16]!
stp x28, lr, [sp, #-16]!
Upon returning to the gadget stream, we'll then restore them.
ldp x28, lr, [sp], #16
ldp x0, x1, [sp], #16
ldp x2, x3, [sp], #16
ldp x4, x5, [sp], #16
ldp x6, x7, [sp], #16
ldp x8, x9, [sp], #16
ldp x10, x11, [sp], #16
ldp x12, x13, [sp], #16
ldp x14, x15, [sp], #16
ldr x25, [sp], #16
Each operation needs an implementation for every platform; and probably a set of gadgets for each possible set of operands.
At 14 GP registers, that means that
1 operand => 16 gadgets 2 operands => 256 gadgets 3 operands => 4096 gadgets
Calls a helper function by address.
IR Format: br <ptr address>
Gadget type: single
# Get our C runtime function's location as a pointer-sized immediate...
"ldr x27, [x28], #8",
# Store our TB return address for our helper. This is necessary so the GETPC()
# macro works correctly as used in helper functions.
"str x28, [x25]",
# Prepare ourselves to call into our C runtime...
*C_CALL_PROLOGUE,
# ... perform the call itself ...
"blr x27",
# Save the result of our call for later.
"mov x27, x0",
# ... and restore our environment.
*C_CALL_EPILOGUE,
# Restore our return value.
"mov x0, x27"
Branches to a given immediate address. Branches are
IR Format: br <ptr address>
Gadget type: single
# Use our immediate argument as our new bytecode-pointer location.
ldr x28, [x28]
Performs a comparison between two 32-bit operands.
IR Format: setcond32 <cond>, Rd, Rn, Rm
Gadget type: treated as 10 operations with variants for every Rd
/Rn
/Rm
(40,960)
subs Wd, Wn, Wm
cset Wd, <cond>
QEMU Cond | AArch64 Cond |
---|---|
EQ | EQ |
NE | NE |
LT | LT |
GE | GE |
LE | LE |
GT | GT |
LTU | LO |
GEU | HS |
LEU | LS |
GTU | HI |
Performs a comparison between two 32-bit operands.
IR Format: setcond64 <cond>, Rd, Rn, Rm
Gadget type: treated as 10 operations with variants for every Rd
/Rn
/Rm
(40,960)
subs Xd, Xn, Xm
cset Xd, <cond>
Comparison chart is the same as the _i32
variant.
Compares two 32-bit numbers, and branches if the comparison is true.
IR Format: brcond Rn, Rm, <cond>
Gadget type: treated as 10 operations with variants for every Rn
/Rm
(2560)
# Perform our comparison and conditional branch.
subs Wrz, Wn, Wm
br<cond> taken
# Consume the branch target, without using it.
add x28, x28, #8
# Perform our end-of-instruction epilogue.
<epilogue here>
taken:
# Update our bytecode pointer to take the label.
ldr x28, [x28]
Comparison chart is the same as in setcond_i32
.
Compares two 64-bit numbers, and branches if the comparison is true.
IR Format: brcond Rn, Rm, <cond>
Gadget type: treated as 10 operations with variants for every Rn
/Rm
(2560)
# Perform our comparison and conditional branch.
subs Xrz, Xn, Xm
br<cond> taken
# Consume the branch target, without using it.
add x28, x28, #8
# Perform our end-of-instruction epilogue.
<epilogue here>
taken:
# Update our bytecode pointer to take the label.
ldr x28, [x28]
Comparison chart is the same as in setcond_i32
.
Moves a value from a register to another register.
IR Format: mov Rd, Rn
Gadget type: gadget per Rd
+ Rn
combo (256)
mov Rd, Rn
Moves a value from a register to another register.
IR Format: mov Rd, Rn
Gadget type: gadget per Rd
+ Rn
combo (256)
mov Xd, Xn
Moves an 32b immediate into a register.
IR Format: mov Rd, #imm32
Gadget type: gadget per Rd
(16)
ldr w27, [x28], #4
mov Wd, w27
Moves an 64b immediate into a register.
IR Format: mov Rd, #imm64
Gadget type: gadget per Rd
(16)
ldr x27, [x28], #4
mov Xd, x27
Load byte from host memory to register.
IR Format: ldr Rd, Rn, <signed offset>
Gadget type: gadget per Rd
& Rn
(256)
ldrsw x27, [x28], #4
ldrb Xd, [Xn, x27]
Load byte from host memory to register; sign extending.
IR Format: ldr Rd, Rn, <signed offset>
Gadget type: gadget per Rd
& Rn
(256)
ldrsw x27, [x28], #4
ldrsb Xd, [Xn, x27]
Load 16b from host memory to register.
IR Format: ldr Rd, Rn, <signed offset>
Gadget type: gadget per Rd
& Rn
(256)
ldrsw x27, [x28], #4
ldrh Wd, [Xn, x27]
Load 16b from host memory to register; sign extending.
IR Format: ldr Rd, Rn, <signed offset>
Gadget type: gadget per Rd
& Rn
(256)
ldrsw x27, [x28], #4
ldrsh Xd, [Xn, x27]
Load 32b from host memory to register.
IR Format: ldr Rd, Rn, <signed offset>
Gadget type: gadget per Rd
& Rn
(256)
ldrsw x27, [x28], #4
ldr Wd, [Xn, x27]
Load 32b from host memory to register; sign extending.
IR Format: ldr Rd, Rn, <signed offset>
Gadget type: gadget per Rd
& Rn
(256)
ldrsw x27, [x28], #4
ldrsw Xd, [Xn, x27]
Load 64b from host memory to register.
IR Format: ldr Rd, Rn, <signed offset>
Gadget type: gadget per Rd
& Rn
(256)
ldrsw x27, [x28], #4
ldr Xd, [Xn, x27]
Stores byte from register to host memory.
IR Format: str Rd, Rn, <signed offset>
Gadget type: gadget per Rd
& Rn
(256)
ldrsw x27, [x28], #4
strb Wd, [Xn, x27]
Stores 16b from register to host memory.
IR Format: str Rd, Rn, <signed offset>
Gadget type: gadget per Rd
& Rn
(256)
ldrsw x27, [x28], #4
strh Wd, [Xn, x27]
Stores 32b from register to host memory.
IR Format: str Rd, Rn, <signed offset>
Gadget type: gadget per Rd
& Rn
(256)
ldrsw x27, [x28], #4
str Wd, [Xn, x27]
Stores 64b from register to host memory.
IR Format: str Rd, Rn, <signed offset>
Gadget type: gadget per Rd
& Rn
(256)
ldrsw x27, [x28], #4
str Xd, [Xn, x27]
Loads 32b from guest memory to register.
IR Format: ld Rd, <foreign/guest pointer>, <memory operation>
Gadget type: thunk per Rd
into C impl?
Loads 64b from guest memory to register.
IR Format: ld Rd, <foreign/guest pointer>, <memory operation>
Gadget type: thunk per Rd
into C impl?
Stores 32b from a register to guest memory.
IR Format: st Rd, <foreign/guest pointer>, <memory operation>
Gadget type: thunk per Rd
into C impl
Stores 64b from a register to guest memory.
IR Format: st Rd, <foreign/guest pointer>, <memory operation>
Gadget type: thunk per Rd
into C impl?
See note on qemu_ld_i32
.
Adds two 32-bit numbers.
IR Format: add Rd, Rn, Rm
Gadget type: gadget per Rd
, Rn
, Rm
(4096)
add Wd, Wn, Wm
Adds two 64-bit numbers.
IR Format: add Rd, Rn, Rm
Gadget type: gadget per Rd
, Rn
, Rm
(4096)
add Xd, Xn, Xm
Subtracts two 32-bit numbers.
IR Format: add Rd, Rn, Rm
Gadget type: gadget per Rd
, Rn
, Rm
(4096)
Sub Wd, Wn, Wm
Subtracts two 64-bit numbers.
IR Format: sub Rd, Rn, Rm
Gadget type: gadget per Rd
, Rn
, Rm
(4096)
sub Xd, Xn, Xm
Multiplies two 32-bit numbers.
IR Format: mul Rd, Rn, Rm
Gadget type: gadget per Rd
, Rn
, Rm
(4096)
mul Wd, Wn, Wm
Multiplies two 64-bit numbers.
IR Format: mul Rd, Rn, Rm
Gadget type: gadget per Rd
, Rn
, Rm
(4096)
mul Xd, Xn, Xm
Divides two 32-bit numbers; considering them signed.
IR Format: div Rd, Rn, Rm
Gadget type: gadget per Rd
, Rn
, Rm
(4096)
sdiv Wd, Wn, Wm
Divides two 64-bit numbers; considering them signed.
IR Format: div Rd, Rn, Rm
Gadget type: gadget per Rd
, Rn
, Rm
(4096)
sdiv Xd, Xn, Xm
Divides two 32-bit numbers; considering them unsigned.
IR Format: div Rd, Rn, Rm
Gadget type: gadget per Rd
, Rn
, Rm
(4096)
udiv Wd, Wn, Wm
Divides two 32-bit numbers; considering them unsigned.
IR Format: div Rd, Rn, Rm
Gadget type: gadget per Rd
, Rn
, Rm
(4096)
udiv Xd, Xn, Xm
Computes the division remainder (modulus) of two 32-bit numbers; considering them signed.
IR Format: rem Rd, Rn, Rm
Gadget type: gadget per Rd
, Rn
, Rm
(4096)
sdiv w27, Wn, Wm
msub Wd, w27, Wm, Wn
Computes the division remainder (modulus) of two 64-bit numbers; considering them signed.
IR Format: rem Rd, Rn, Rm
Gadget type: gadget per Rd
, Rn
, Rm
(4096)
sdiv x27, Xn, Xm
msub Xd, x27, Xm, Xn
Computes the division remainder (modulus) of two 32-bit numbers; considering them unsigned.
IR Format: rem Rd, Rn, Rm
Gadget type: gadget per Rd
, Rn
, Rm
(4096)
udiv w27, Wn, Wm
msub Wd, w27, Wm, Wn
Computes the division remainder (modulus) of two 32-bit numbers; considering them unsigned.
IR Format: rem Rd, Rn, Rm
Gadget type: gadget per Rd
, Rn
, Rm
(4096)
udiv x27, Xn, Xm
msub Xd, x27, Xm, Xn
Logically inverts a 32-bit number.
IR Format: not Rd, Rn
Gadget type: gadget per Rd
, Rn
(256)
mvn Wd, Wn
Logically inverts a 64-bit number.
IR Format: not Rd, Rn
Gadget type: gadget per Rd
, Rn
(256)
mvn Xd, Xn
Arithmetically inverts (two's compliment) a 32-bit number.
IR Format: not Rd, Rn
Gadget type: gadget per Rd
, Rn
(256)
neg Wd, Wn
Arithmetically inverts (two's compliment) a 64-bit number.
IR Format: not Rd, Rn
Gadget type: gadget per Rd
, Rn
(256)
neg Xd, Xn
Logically ANDs two 32-bit numbers.
IR Format: and Rd, Rn, Rm
Gadget type: gadget per Rd
, Rn
, Rm
(4096)
and Wd, Wn, Wm
Logically ANDs two 64-bit numbers.
IR Format: and Rd, Rn, Rm
Gadget type: gadget per Rd
, Rn
, Rm
(4096)
and Xd, Xn, Xm
Logically ORs two 32-bit numbers.
IR Format: or Rd, Rn, Rm
Gadget type: gadget per Rd
, Rn
, Rm
(4096)
or Wd, Wn, Wm
Logically ORs two 64-bit numbers.
IR Format: or Rd, Rn, Rm
Gadget type: gadget per Rd
, Rn
, Rm
(4096)
or Xd, Xn, Xm
Logically XORs two 32-bit numbers.
IR Format: xor Rd, Rn, Rm
Gadget type: gadget per Rd
, Rn
, Rm
(4096)
eor Wd, Wn, Wm
Logically XORs two 64-bit numbers.
IR Format: xor Rd, Rn, Rm
Gadget type: gadget per Rd
, Rn
, Rm
(4096)
eor Xd, Xn, Xm
Logically shifts a 32-bit number left.
IR Format: shl Rd, Rn, Rm
Gadget type: gadget per Rd
, Rn
, Rm
(4096)
lsl Wd, Wn, Wm
Logically shifts a 64-bit number left.
IR Format: shl Rd, Rn, Rm
Gadget type: gadget per Rd
, Rn
, Rm
(4096)
lsl Xd, Xn, Xm
Logically shifts a 32-bit number right.
IR Format: shr Rd, Rn, Rm
Gadget type: gadget per Rd
, Rn
, Rm
(4096)
lsr Wd, Wn, Wm
Logically shifts a 64-bit number right.
IR Format: shr Rd, Rn, Rm
Gadget type: gadget per Rd
, Rn
, Rm
(4096)
lsr Xd, Xn, Xm
Arithmetically shifts a 32-bit number right.
IR Format: sar Rd, Rn, Rm
Gadget type: gadget per Rd
, Rn
, Rm
(4096)
asr Wd, Wn, Wm
Arithmetically shifts a 64-bit number right.
IR Format: sar Rd, Rn, Rm
Gadget type: gadget per Rd
, Rn
, Rm
(4096)
asr Xd, Xn, Xm
Rotates a 32-bit number left.
IR Format: rotl Rd, Rn, Rm
Gadget type: gadget per Rd
, Rn
, Rm
(4096)
rol Wd, Wn, Wm
Rotates a 64-bit number left.
IR Format: rotl Rd, Rn, Rm
Gadget type: gadget per Rd
, Rn
, Rm
(4096)
rol Xd, Xn, Xm
Rotates a 32-bit number right.
IR Format: rotr Rd, Rn, Rm
Gadget type: gadget per Rd
, Rn
, Rm
(4096)
ror Wd, Wn, Wm
Rotates a 64-bit number right.
IR Format: rotr Rd, Rn, Rm
Gadget type: gadget per Rd
, Rn
, Rm
(4096)
ror Xd, Xn, Xm
Optional; not currently implementing.
Optional; not currently implementing.
Sign extends the lower 8b of a register into a 32b destination.
IR Format: ext8s Rd, Rn
Gadget type: gadget per Rd
, Rn
(256)
sxtb Wd, Wn
Sign extends the lower 8b of a register into a 64b destination.
IR Format: ext8s Rd, Rn
Gadget type: gadget per Rd
, Rn
(256)
sxtb Xd, Wn
Zero extends the lower 8b of a register into a 32b destination.
IR Format: ext8u Rd, Rn
Gadget type: gadget per Rd
, Rn
(256)
and Xd, Xn, #0xff
Zero extends the lower 8b of a register into a 64b destination.
IR Format: ext8u Rd, Rn
Gadget type: gadget per Rd
, Rn
(256)
and Xd, Xn, #0xff
Sign extends the lower 16b of a register into a 32b destination.
IR Format: ext16s Rd, Rn
Gadget type: gadget per Rd
, Rn
(256)
sxth Xd, Wn
Sign extends the lower 16b of a register into a 64b destination.
IR Format: ext16s Rd, Rn
Gadget type: gadget per Rd
, Rn
(256)
sxth Xd, Wn
Zero extends the lower 16b of a register into a 32b destination.
IR Format: ext16u Rd, Rn
Gadget type: gadget per Rd
, Rn
(256)
and Wd, Wn, #0xffff
Zero extends the lower 16b of a register into a 32b destination.
IR Format: ext16u Rd, Rn
Gadget type: gadget per Rd
, Rn
(256)
and Wd, Wn, #0xffff
Sign extends the lower 32b of a register into a 64b destination.
IR Format: ext32s Rd, Rn
Gadget type: gadget per Rd
, Rn
(256)
sxtw Xd, Wn
Zero extends the lower 32b of a register into a 64b destination.
IR Format: ext32s Rd, Rn
Gadget type: gadget per Rd
, Rn
(256)
sxtw Xd, Wn
Sign extends the lower 32b of a register into a 64b destination.
IR Format: ext32s Rd, Rn
Gadget type: gadget per Rd
, Rn
(256)
sxtw Xd, Wn
Zero extends the lower 32b of a register into a 32b destination.
IR Format: ext32u Rd, Rn
Gadget type: gadget per Rd
, Rn
(256)
and Xd, Xn, #0xffffffff
Byte-swaps a 16b quantity.
IR Format: bswap16 Rd, Rn
Gadget type: gadget per Rd
, Rn
(256)
rev w27, Wn
lsr Wd, w27, #16
Byte-swaps a 16b quantity.
IR Format: bswap16 Rd, Rn
Gadget type: gadget per Rd
, Rn
(256)
rev w27, Wn
lsr Wd, w27, #16
Byte-swaps a 32b quantity.
IR Format: bswap32 Rd, Rn
Gadget type: gadget per Rd
, Rn
(256)
rev Wd, Wn
Byte-swaps a 32b quantity.
IR Format: bswap32 Rd, Rn
Gadget type: gadget per Rd
, Rn
(256)
rev Wd, Wn
Byte-swaps a 64b quantity.
IR Format: bswap64 Rd, Rn
Gadget type: gadget per Rd
, Rn
(256)
rev Xd, Xn
Exits the translation block. Has no gadget; but instead inserts the address of the translation block epilogue.
Memory barrier.
IR Format: mb <type>
Gadget type: gadget per type
# !!! TODO
We still need to look up out how to map QEMU MB types map to AArch64 ones. This might take nuance.