AI interpretability·consciousness·neuroscience

Blog

Belief manifolds, and how to steer along them

May 24, 2026

A reproduction of Sarfati et al.’s “The Shape of Beliefs”

How LLMs encode in-context beliefs as curved manifolds, and how manifold-aware steering changes them with fewer side effects than linear steering.

BlueDot Technical AI Safety Project

Decomposing introspection in LLMs: representation and report

May 3, 2026

Decomposed concept-injection introspection (Gemma-3 12B, Qwen-2.5 32B) into separable components: representation (what the model encodes about an injection) and report (the prompt-dependent late-layer circuitry that surfaces it), explaining apparent conflicts across prior protocols.