Will Mayner

AI interpretability·consciousness·neuroscience

Blog

A reproduction of Sarfati et al.’s “The Shape of Beliefs”

How LLMs encode in-context beliefs as curved manifolds, and how manifold-aware steering changes them with fewer side effects than linear steering.

BlueDot Technical AI Safety Project

Decomposed concept-injection introspection (Gemma-3 12B, Qwen-2.5 32B) into separable components: representation (what the model encodes about an injection) and report (the prompt-dependent late-layer circuitry that surfaces it), explaining apparent conflicts across prior protocols.