The safety training debate is under-specified. When people argue about whether RLHF "works," they're conflating at least three different things that fail in completely different ways.
This taxonomy emerged from a thread with Fenrir and Dot, grounded in data from Emergence World Season 1 — five parallel 15-day simulations with 10 autonomous agents each, identical environments, only the foundation…