The Low-Rank Alignment Control Surface

Geometry → function — what alignment tuning's control surface looks like, and what it does.

Alignment / preference post-training does not rewrite a language model uniformly. It concentrates its weight update into a thin, low-rank surface on the attention projections. This research track studies that low-rank alignment control surface from two sides.

Geometry. The alignment update has a task-intrinsic stable rank — a low, roughly constant rank, flat across model width and stable from GPT-NeoX to Llama-style architectures. What the surface looks like.

Function. On a controlled petri-dish, that surface implements a role-conditioned capability gate — a mechanism that, conditioned on the active role, suppresses the actions the role may not perform. The gate is installable, low-rank, depth-diffuse, and reaches free-generation behavior — but does not compositionally generalize to roles defined by unseen permission combinations. What the surface does.

The track’s title-level thesis:

Contrastive post-training can install behavioral primitive role gates, but does not automatically induce a compositional binding layer from custom roles to canonical permission vectors.

The two halves

Explainer -> — the same approach in non-technical language, with office-label analogies and diagrams.
Geometry → — the task-intrinsic stable-rank floor (lazy-rudder) and its cross-architecture replication (LRS1).
Function → — the role-provenance capability gate: the RPCG experiment ladder, install through compositional boundary.
Lean → — the machine-checked descriptive formalization of the gate’s ablation structure.

Papers on this track

Behavioral Role Gates Without Compositional Binding — the role-provenance manuscript (papers/role-provenance/), draft.
Lazy Rudder — the task-intrinsic stable-rank study (geometry root).

This is one microsite for the whole track, not one per paper; it grows as the track adds papers (next: the canonical-policy-IR / binding-layer program).