Prompt Repetition As a Surprisingly Strong Baseline for Non-reasoning LLMs

A recent Google Research paper, Prompt Repetition Improves Non-Reasoning LLMs, makes a claim that feels almost too simple to be real: if you are running an LLM in a non-reasoning mode, you can often get better answers by duplicating the prompt verbatim before the model responds.

The trick

The transformation is exactly what it sounds like:

Baseline: <QUERY>

Repeat:   <QUERY><QUERY>

No extra instructions, no extra examples, no chain of thought prompting. Just the same prompt twice. See an example below from the paper.

What the authors found

Across a set of major models and a range of benchmarks, the authors report that prompt repetition is consistently helpful in the non-reasoning setting. They present results as head to head comparisons against the baseline prompt, and show broad improvement without changing the expected output format.

One detail that makes this especially practical is that they also examine latency and output length. The headline is that repeating the prompt generally does not increase the number of generated tokens, and in their experiments it usually does not meaningfully increase end to end latency either.

Why would this work

The paper’s explanation is rooted in a basic property of causal language models: tokens are processed left to right, and each position attends only to what came before it. This means the order of information in a prompt can matter more than we would like. If important details appear early, the model “saw” them before it knew what the final question would be. Repeating the prompt gives the model a second chance to integrate the whole request with the question now fully in view.

A nice way to interpret this is that prompt repetition is not adding new information. It is changing the geometry of attention by making the same information appear again later in the context window, closer to where the model must commit to an answer.

When it seems most useful

The effect should be strongest when prompts are long, structured, or easy to misread. The paper highlights cases like multiple choice formats where the placement of options and questions can create “unfriendly” ordering effects. Repetition helps smooth out those quirks because whatever was awkwardly positioned the first time is now encountered again.

They also introduce stress tests where the model must retrieve or locate items in long lists, and some of those show dramatic jumps with repetition.

What changes when reasoning is enabled

An important nuance is that these gains are mainly about non-reasoning usage. When the model is already encouraged to reason step by step, repetition tends to help less often, and many results become ties. The paper’s intuition is that reasoning style outputs often restate the problem anyway, which can partially mimic the benefit of repetition.

A practical takeaway for prompt design

If you are building an application where you want better performance without paying for longer responses, prompt repetition looks like a strong “cheap” baseline to try first. It is also a reminder that prompt engineering is not only about clever wording. Sometimes it is about controlling where information appears in the sequence so the model can reliably use it.

References

Leviathan, Yaniv, Matan Kalman, and Yossi Matias. “Prompt Repetition Improves Non-Reasoning LLMs.” arXiv (2025). (arXiv)

— Andrew

5,181 hits

Blog at WordPress.com.

Up ↑