dtch1997

Follow

Daniel Tan dtch1997

Follow

Mechanistic interpretability researcher. Interested in interpreting multimodal foundation models

56 followers · 18 following

Achievements

Achievements

Highlights

Pro

Pinned Loading

steering-bench steering-bench Public

Official codebase for "Analyzing the Generalization and Reliability of Steering Vectors"

Python 5 1
tms-kit tms-kit Public

Toy models of superposition

HTML 3
rl_cbf rl_cbf Public

Code accompanying "Value Functions are Control Barrier Functions: Verification of Safe Policies using Control Theory"

Python 22
openai-finetuner openai-finetuner Public

High-level interface for launching and tracking OpenAI fine-tuning jobs

Python
tiny-eval tiny-eval Public

Python