Explore the Data
Four interactive slices into the measurements underlying the paper: the depth-angle fragmentation spectrum, the per-layer $\beta$ profile on Gemma-4 26B MoE, the Pareto frontier of PPL vs bit-rate, and the front-protection $N$-sweep that exposes the pressure-release-layer anomaly at $N = 11$.
Fragmentation
The depth-angle spectrum across four Gemma-4 variants
Same 2×2 trajectory plot as the hero, repeated here for reference. Hover any point to see layer index, head dimension, measured angle, and the bit budget the horizontal-slice allocator chose for it. Interleaved $d_\ell = 512$ “pressure-release” heads are rendered as larger markers and are eligible for the 1-bit band regardless of angle.
Source: cross-check/phase-collapse/results/bit_map_hs_{smoke_e2b,e4b,26b_moe_wikitext,31b}.json.
Moment-ratio depth profile
Per-layer β on Gemma-4 26B MoE
Direct measurement of $\beta_\ell = \mu_1^2 / \mu_2$ for each of the 30 layers of Gemma-4 26B MoE after the learned $L^1$-max rotation. Pressure-release layers (the $d_\ell = 512$ wide heads at layers 5, 11, 17, 23, 29) are marked in blue; all others in red. The sequence dips toward the CLT attractor $2/\pi$ on the wide heads and rises toward $3/4$ on the narrow heads — a direct picture of the anti-kurtosis effect.
Hover any point for the exact $\beta_\text{empirical}$, the i.i.d.\ prediction $\beta_\text{iid-pred}$, and the empirical floor $1 - \beta_\text{emp}$. Source: beta_measure_26b_moe.json. Mean empirical $\beta = 0.7016$, mean floor $= 29.84\%$, predicted floor $= 29.89\%$: the two agree to 0.05 pp despite 95% anisotropy in the covariance.
Pareto frontier
PPL vs bits/entry
Each dot is one operating regime: FP16 baseline, rotation-only (no quantization, showing the bf16 orthogonality tax), 1-bit with and without a top-$k$ fp16 sparse correction, and heterogeneous schedules. The sweet spot at $k = d/8$ is ringed; the frontier is the dashed orange curve through the uniform-$k$ points.
Source: ppl_pareto.csv. The unrotated 1-bit baseline (PPL $\approx 968{,}762$) is excluded from the chart to keep the log scale legible; it is the dramatic failure mode against which everything else is measured.
Front protection
The $N = 11$ monotonicity break
A “front-protection” schedule allocates a higher sparse budget ($k = d/4$) to the first $N$ layers and a lower one ($k = d/16$) to the remaining $30 - N$. PPL should drop monotonically as $N$ increases. It does, except at $N = 11$, where PPL rises relative to $N = 7$ despite spending more total bits.
Hover any point for the regime label and exact bits/entry. $N = 11$ is marked larger in orange — it is the only non-monotone point. The explanation (detailed in the paper): layers $\{5, 11, 17, 23, 29\}$ are pressure-release heads with $d_\ell = 512$ and already-aligned angles; raising their sparse budget spends bits on already-good coordinates while starving the fragmented mid-layers $\{12, \ldots, 21\}$. A one-line correction — skip wide heads in the front-protection rule — restores monotonicity.