25

Evaluating chain-of-thought monitorability

Related check out chain of draft if you haven't.

Similar performance with 7% of tokens as chain of thought.

https://arxiv.org/abs/2502.18600

20 minutes agoleetrout

I might be missing something here as a non-expert, but isn’t chain-of-thought essentially asking the model to narrate what it’s “thinking,” and then monitoring that narration?

That feels closer to injecting a self-report step than observing internal reasoning.

an hour agoursAxZA

> Our expectation is that combining multiple approaches—a defense-in-depth strategy—can help cover gaps that any single method leaves exposed.

Implement hooks in codex then.