Cliff Tokens

Overview

Existing analyses often localize failure after it has already emerged in a reasoning trace. Cliff tokens target the preceding failure trigger token: the token at which the trace begins to shift toward failure.

We formalize cliff tokens with a z-test-based adaptive threshold that separates statistically significant reasoning failures from sampling noise, showing that these tokens act as failure triggers.
We introduce a cliff taxonomy of deterministic, uncertain, and sampled-off cliffs, demonstrating that each category has distinct probabilistic characteristics.
We validate single-token supervision at cliff positions through Cliff-DPO post-training to improve reasoning performance, with effectiveness varying across cliff types.

What Is a Cliff Token?

Token-wise potential is the probability that a reasoning process reaches the correct answer, conditioned on the partial trace generated up to a token position. Empirically, we estimate it by rolling out continuations from each prefix and measuring the success rate.

A cliff token is a token where token-wise potential drops significantly from the previous prefix. Before this token, the trace may remain recoverable; after the token is fixed, continuations are more likely to end in failure.

To distinguish statistically significant shifts from rollout noise, the detection criterion uses an adaptive threshold based on a one-sided two-proportion z-test. We use N = 64 rollouts per token position, extending potential-based analysis from coarse trace segments to token-level resolution.

RQ1: Do Cliff Tokens Trigger Reasoning Failures?

Cliff tokens occur more often in incorrect traces, and deleting the first cliff token can restore reasoning. Across seven instruction-tuned models on GSM1K, MATH500, and AIME 2025, incorrect traces are more likely to contain cliff tokens and have higher average cliff-token counts in most model settings.

Cliff token occurrence statistics across models — **Cliff-token occurrence.** Proportion of traces containing at least one cliff token and average cliff-token counts per trace, aggregated across the three mathematical reasoning benchmarks.

Pass@k comparison between Cliff-del and Cliff-keep on incorrect traces — **Cliff-del vs. Cliff-keep.** **Cliff-del reaches pass@64 = 1.0** across evaluated panels, while Cliff-keep remains between **0.71 and 1.00** in panels with cliff tokens. The gap indicates that the first cliff token acts as a failure trigger.

RQ2: What Probabilistic Patterns Characterize Cliff Tokens?

Cliff tokens exhibit distinct probabilistic structure. Token entropy and token greediness separate them into three failure modes: confident bias, competitive uncertainty, and stochastic sampling noise.

Deterministic cliff

A greedy token with low entropy. The model samples the cliff token with near-absolute certainty.

Uncertain cliff

A greedy token with high entropy. The greedy cliff token is sampled despite high uncertainty.

Sampled-off cliff

A non-greedy token with high entropy. The non-greedy cliff token is sampled stochastically.

RQ3: How Does the Cliff Taxonomy Vary Across Families and Scales?

The cliff taxonomy changes with model family and scale. Deterministic cliffs are largely scale-invariant, while uncertain cliffs reflect model-specific gaps and sampled-off cliffs show scale-asymmetry.

Cross-scale transfer of cliff probability mass between Qwen3-0.6B and Qwen3-8B — **Cross-scale transfer of cliff probability mass.** Deterministic cliffs stay near zero shift across Qwen3-0.6B and Qwen3-8B, uncertain cliffs lose probability mass in both directions, and sampled-off cliffs show an asymmetric shift across scale directions.

Cliff-DPO

Cliff positions can also provide targeted supervision. Cliff-DPO applies preference optimization at the token position where the reasoning trace diverges into failure.

Citation

@article{ko2026clifftoken,
  title={Cliff Tokens: Identifying Single-Token Failure Triggers in LLM Mathematical Reasoning},
  author={Ko, Jaeyong and Kang, Pilsung and Lee, Yukyung},
  journal={arXiv preprint arXiv:2606.25524},
  year={2026},
  eprint={2606.25524},
  archivePrefix={arXiv}
}