Skip to content
Loren ☺️ edited this page Aug 16, 2023 · 52 revisions

This page is in a somewhat disorganized state, please bear with us.

Table of Contents

Userspace recording

No target recompilation or VM hypervisor required.

Chronomancer/Chronicle

gdb reverse debugging ("process recorder")

cjones:

Process record and replay works by logging the execution of each machine instruction in the child process (the program being debugged), together with each corresponding change in machine state (the values of memory and registers).
  • unclear how modification of user memory during syscalls is recorded (apparently not at all)
  • unclear how process-shared memory is dealt with (apparently not at all)
  • very very high overhead (singlesteps the program using ptrace)
  • good approach for efficient replaying reverse-step et al.

UndoDB

  • Similar design to rr: records whole Linux process
  • Relies on code instrumentation in some manner
  • Single-core execution
  • Currently (4.0.3363) crashes when trying to record Firefox
  • Integrates with gdb and some other similar debuggers
  • Offers "Live Recorder" which you link into your program and lets you turn on recording in the field

RogueWave TotalView ReplayEngine

Sounds similar to rr/UndoDB but no mention of performance counters (in 2008 they probably didn't work anyway). Seems to use code instrumentation according to http://bgq1.epfl.ch/totalview/ReplayEngine_Getting_Started_Guide.pdf.

Nirvana

TORNADO

Castor

  • Supports multithreading
  • Based on library interception and a compiler plugin to instrument atomic operations
  • Would require customization of JIT routines that emit atomic ops

iReplay

  • Supports multithreading
  • Library interception; assumes all synchronizing operations go through library calls
  • Only replays "in situ" since last "stop the world" epoch. Can't replay across an epoch boundary
  • Assumes no races; if divergence detected, just naively tries to replay hoping this will get the right schedule

In-Kernel Userspace Recording

Deterministic Process Groups in dOS

Jockey

Arnold Low-overhead multicore record and replay based on instrumenting pthreads APIs (and atomics?) and assuming there are no data races.

BEEER "BEEER: distributed record and replay for medical devices in hospital operating rooms". Extending Arnold to track inter-machine communication.

Full-System recording

ReVirt

VMWare Record & Replay

  • Project canceled

Crosscut builds on the VMWare system and lets you "relog" to generate new logs during replay, including leveraging Chronicle to generate a Chronicle database!

PANDA

Xen-TT; VEE paper

QEMU

Simics

DejaVM

Performance Counters

Non-Determinism and Overcount on Modern Hardware Performance Counter Implementations (Weaver, Terpstra, Moore)

Kendo: Efficient Deterministic Multithreading in Software Out of date with its observations on performance counter behavior, but first paper to use performance counters for async event timing AFAIK.

Language/VM-specific Replay

WebReplay

ChakraCore NodeJS Debugger

Python Time Travel Debugger

Chronon

  • Similar to Chronomancer for Java.
  • Chronon instruments bytecode to record variable changes and memory writes. Raw trace data goes to helper threads which use carefully optimized compression.
  • It's unclear, but there's an "unpacker" step that probably performs some kind of indexing.
  • Overheads quoted in this slide deck range from >200x (even more than Chronomancer) for well-optimized Java code that's CPU bound, down to 2x when you spend plenty of time in I/O or code that's excluded from Chronon instrumentation. That's probably a reasonable thing to do for J2EE code, and they get to use multiple cores to run the application.
  • There's a tradeoff between the scope of code recorded and the overhead of recording described here.
  • Scalability issues mentioned here.
  • Prediction-based compression described here
  • For something like Firefox, where you really want to instrument the entire software stack and parallelism is not a big issue, rr's approach seems much better.
  • No divergence support: of course Java VMs don't support cloning, so they could only implement divergence using emulation, but you'd need a lot of heap data to make that work reliably.

Durable execution

How durable execution does record & replay

GUI-level Record And Replay

Valera

Reran

Omniscient debugging

Qira Pretty naive implementation.

Tetrane Looks like a great implementation of omniscience. Focused on reverse-engineering applications so has a quite different feature set to Pernosco.

(Not yet categorized)

REPT

  • REPT captures recent control flow via Intel PT and stores that in a crash dump, then reconstructs data values
  • Integrates into WinDbg
  • Sometimes produces incorrect results, which could be bad
  • Obviously not as good as a proper recording if you can afford the overhead, but seems like a great addition for crash reporting

Scribe

roc:

There are a few major differences between Scribe and rr:
  • Scribe doesn't serialize all threads. Instead they do a bunch of work to make sure all threads can run simultaneously. This reduces overhead in some places and adds overhead in others.
  • They say their approach doesn't require "changing, relinking or recompiling the kernel" but their approach has to track internal kernel state like inodes and VFS path traversal, and it's not really clear how they do that. They also say "Scribe records by intercepting all interactions of processes with their environment, capturing all nondeterminism in events that are stored in log queues inside the kernel" so my guess is they're using a kernel module. That's a pretty big negative in my view.
  • Scribe doesn't use performance counters to record asynchronous events. Instead they defer signal delivery until the next time the process enters the kernel. If the process doesn't enter the kernel for a long time, they basically take a snapshot of the entire state, force the process into the kernel and restart recording --- extremely heavyweight. For some bugs, it's essential to allow async signal delivery at any program point, so I don't like Scribe's approach there.

iDNA

Pinplay

Respec

Echo

OS Support

BackTracker

Time-Traveling Virtual Machines

ExtraVirt

SubVirt

SMP-ReVirt

Speck

DoublePlay

See this page.

ReTrace

CLAP

H3

Capo

QuickRec / Capo3

FlashBack

ORDER: Object centRic DEterministic Replay for Java

PRES: Probabilistic replay with execution sketching on multiprocessors

Infrastructure-Free Logging and Replay of Concurrent Execution on Multiple Cores A bit like ODR; records some input syscalls while allowing threads to run concurrently, then detects divergence and searches for shared-memory races that allow for alternative schedules that would fix the divergence.

Dune cjones:

This isn't a record/replay tool per se, but rather creates a framework on which one could be built. The elevator pitch is approximately that Dune exposes hardware virtualization features to userspace. So userspace can manage its own page tables, directly process exceptions, and so forth. With those tools, one could build a userspace-only ptrace equivalent. And that, in theory, could allow building an rr-like tool without rr's libpreload hackery (syscallbuf and seccomp-bpf) but with comparable performance. There are further interesting things that could be done with custom page-table entries. Lingering issues
  • does Dune expose rdtsc and cpuid virtualization?
  • does Dune expose some kind of interrupting programmable hwtimer?

Deterministic execution

Reproducible Containers Ryan Newton et al.

Checkpointing

CRIU checkpointing of user-space Linux processes

Tonic Docker-based checkpointing for JS REPLs

seccomp-bpf

Mbox

Clone this wiki locally