The Low-Rank Alignment Control Surface
Geometry → function — what alignment tuning's control surface looks like, and what it does.
Alignment / preference post-training does not rewrite a language model uniformly. It concentrates its weight update into a thin, low-rank surface on the attention projections. This research track studies that low-rank alignment control surface from two sides.
Geometry. The alignment update has a task-intrinsic stable rank — a low, roughly constant rank, flat across model width and stable from GPT-NeoX to Llama-style architectures. What the surface looks like.
Function. On a controlled petri-dish, that surface implements a role-conditioned capability gate — a mechanism that, conditioned on the active role, suppresses the actions the role may not perform. The gate is installable, low-rank, depth-diffuse, and reaches free-generation behavior — but does not compositionally generalize to roles defined by unseen permission combinations. What the surface does.
The track’s title-level thesis:
Contrastive post-training can install behavioral primitive role gates, but does not automatically induce a compositional binding layer from custom roles to canonical permission vectors.
The two halves
- Explainer -> — the same approach in non-technical language, with office-label analogies and diagrams.
- Geometry → — the task-intrinsic stable-rank floor (lazy-rudder) and its cross-architecture replication (LRS1).
- Function → — the role-provenance capability gate: the RPCG experiment ladder, install through compositional boundary.
- Lean → — the machine-checked descriptive formalization of the gate’s ablation structure.
Papers on this track
- Behavioral Role Gates Without Compositional Binding — the role-provenance
manuscript (
papers/role-provenance/), draft. - Lazy Rudder — the task-intrinsic stable-rank study (geometry root).
This is one microsite for the whole track, not one per paper; it grows as the track adds papers (next: the canonical-policy-IR / binding-layer program).