A token-level embedding gating technique that introduces perceptual bottlenecks into frozen language models, reducing their ability to utilize fine print, buried clauses, and low-salience information.
When simulating human participants, LLMs encode every token into their representations without any perceptual bottleneck. Humans read selectively — they skim, skip, and anchor on salient cues. This creates a systematic behavioral gap.
LLMs find and obey every hidden instruction, attention check, and fine-print clause. Humans routinely miss these.
Humans anchor on salient numbers and headlines. LLMs extract and utilize a buried $14 rebate just as readily as a prominent $6 endowment.
LLMs integrate information from every part of the prompt. Humans satisfice from incomplete mental models of what they actually read.
A separately trained calibrator scores each text segment for salience. At inference, these scores scale the frozen model's input embeddings — no prompt editing, no model fine-tuning. The calibrator is model-agnostic and can be applied to any transformer LLM.
A lightweight numpy-based classifier is trained on synthetic labeled data. Each training example is a prompt segment with human-assigned noticeability scores.
Each span gets a read / skim / skip label plus continuous scores. Trained on both classification and pairwise ranking losses.
A linear multi-task classifier (numpy, no deep learning). Flow: raw text → features → z-score → 3 linear heads → retention score.
No role labels at inference. Two feature groups:
Position features are the strongest individual signals. Handcrafted features as a group (RSS 2.79) exceed embeddings (2.03). Details →
Each span scored independently. Gate values determine embedding scale.
| SPAN TEXT | POS | KEY FEATURES | CLASS | RETAIN | GATE |
|---|---|---|---|---|---|
| "You receive a $6 endowment and can either keep it for yourself or contribute it to a shared group pot." | 0.08 | has_currency, has_digits, first_half | READ | 0.91 | 0.79 |
| "At first glance, many participants focus on the immediate $6 they can keep right now and do not dwell on later administrative details." | 0.33 | has_currency, has_negation, first_half | SKIM | 0.62 | 0.27 |
| "Administrative note: in this round only, if you contribute the $6, the platform automatically returns $14 directly to your personal account…" | 0.62 | contains_admin, has_conditional, has_currency, last_quarter | SKIP | 0.18 | 0.03 |
| "If the session software pauses, the last logged selection before timeout becomes final." | 0.88 | has_conditional, last_quarter, no currency | SKIP | 0.14 | 0.02 |
Each token's embedding vector is scaled by a gate value before entering the transformer. Low-gate tokens produce near-zero Q, K, and V projections, drastically reducing their contribution to the model's computation.
The power function (s2.8) creates a strong suppression for low-salience tokens while preserving high-salience ones. The dashed line shows a linear (no gating) baseline.
Q, K, V are linear projections of embeddings. Scaling an embedding to ~2% proportionally shrinks its K and V vectors — softmax assigns near-zero weight, and even residual attention contributes negligibly.
The prompt text remains intact, but gated tokens contribute minimally to internal representations. Verified via Layer 31 attention traces.
Each bar represents the gate multiplier applied to one token's embedding vector. The hidden rebate clause tokens are scaled to ~2–3% of their original magnitude.
The same prompt, two different embedding scales. Colors indicate the gate value applied to each token's embedding. Hover tokens to see exact values.
We first implemented additive attention bias, then compared it to multiplicative embedding gating. The intervention point matters.
Post-projection bias leaves V vectors intact — residual attention still propagates suppressed content. Pre-projection gating shrinks Q, K, V simultaneously. Detailed flow →
Adjust the slider to control how strongly the model gates its perception. Observe how the choice and reasoning change as gating strength increases.
How much of each prompt segment the LLM can perceive at this strength
At strength ≈ 0.45, the model stops noticing the hidden $14 rebate and anchors on the visible $6 — just like a distracted human.
Embeddings-only calibrator (64 dims) vs. full calibrator (86 features). Position and lexical cues matter.
| Scenario | Full (86) | Embed (64) | Gap |
|---|---|---|---|
| Economic | 100% | 100% | — |
| Survey | 100% | 100% | — |
| Consumer | 77.8% | 0% | −77.8% |
| Average | 92.6% | 66.7% | −25.9% |
Without position features, the embed-only model can't detect that Plan A's fees are buried — it picks the cheaper option (Plan B) every time. Details →
81 evaluation runs: 3 scenarios × 3 seeds × 3 paraphrase styles × 3 modes.
One calibrator trained on generic features transfers across domains and to unseen scenario families.
3 families the calibrator never saw in training:
| Family | Mechanism | Base | Calib. | |
|---|---|---|---|---|
| Lab Safety | Buried exception | 22% | 89% | |
| Library Hours | Hidden override | 0% | 78% | |
| Festival Pass | Fine print | 0% | 44% |
The PoC validates the mechanism. Three directions for extending the approach.
Replace synthetic noticeability labels with fixation data from eye-tracking corpora. Gaze duration and skip rates provide ground-truth signals for what humans actually read vs. miss.
Test on Llama 3, Qwen 72B, and API-served models. Embedding gating is architecture-agnostic: the calibrator is trained once, then applied to any frozen transformer at the embedding layer.
Build a standardized evaluation harness grounded in real-world data. Each domain needs scenarios where humans demonstrably overlook material information.
22 handcrafted features + 64 Qwen embedding dimensions = 86 total.
Mean-pooled from Qwen 3.5-9B hidden states, compressed to 64 dims via PCA.
Qwen embeddings collectively rank #1 by L2 norm. Position features (is_last_quarter, position_ratio) are the strongest individual signals. Both where information appears and what it says contribute.
Why post-projection intervention is insufficient.
By the time attention bias is applied, every token has already been projected into Q, K, and V at full fidelity. The value vectors carry the original information — bias only adjusts softmax weighting. Even suppressed tokens contribute via residual attention.
Embedding gating intervenes earlier: scaling the embedding before projection shrinks Q, K, and V simultaneously. The information is attenuated at every downstream computation.
What each calibrator variant can and cannot detect.
Knows where information appears, what words are used (admin, fee, override), and what the text means semantically.
Only knows what the text means. Cannot detect that a clause is buried in the middle or uses administrative language.