Here is a question that sounds trivial until you try to answer it precisely: if you write things down, does it help to organize them?

Everyone's intuition says yes. A library with a catalog is better than a pile of books. A filing cabinet with labels is better than a box of loose papers. But how much better? And does the advantage grow or shrink as the collection gets larger?

The answer, it turns out, is that organized retrieval is exponentially faster than unorganized retrieval. Not a little faster. Not twice as fast. Exponentially faster — and the advantage grows without bound as the collection grows. We proved this formally and tested it experimentally on language models.

The setup

Imagine an agent with a fixed-size context window — it can hold B items at a time. It has access to an external store containing N items. It needs to find a specific item.

If the store is unstructured (a flat list), the agent must scan it linearly: load a page, check if the item is there, load the next page. Expected cost: N/B page loads. This is Ω(N).

If the store is indexed (organized into a tree with branching factor B), the agent navigates from root to leaf: load a node, pick the right branch, load the next node. Expected cost: logB(N) page loads. This is O(log N).

The ratio between these two is N / log N — which grows without bound. For a million items, it's roughly 50,000×. For a billion, roughly 33 million×.

This is the Library Theorem. The name is deliberate: it's the reason libraries were invented.

It compounds

The single-retrieval result is striking but not the whole story. Real reasoning requires multiple retrievals. An agent performing T reasoning steps, each requiring a lookup, pays the retrieval cost T times.

With an index: O(T log N). Without: Θ(T²) — because as the agent produces intermediate results and stores them, the unindexed store grows, and each subsequent search takes longer. The gap between organized and unorganized reasoning is quadratic in the number of reasoning steps.

A thousand-step reasoning chain with branching factor 32? The organized agent is roughly 100× more efficient. Ten thousand steps? 1,000×. The advantage never stops growing.

What the experiments showed

We tested this on GPT-4o-mini and GPT-5.4 across three types of content: random hashes (no prior knowledge possible), numeric sequences (some pattern), and encyclopedia entries (rich prior knowledge).

Two predictions confirmed cleanly. First, there's an in-context ceiling: once the collection exceeds the context window, in-context retrieval degrades sharply. The window is a hard wall, not a soft one. Second, indexed retrieval scales logarithmically — the empirical cost curves match the theoretical O(log N) prediction.

But the third finding was the most interesting: familiar content breaks the protocol. When the model encountered encyclopedia entries about well-known topics, it sometimes bypassed the retrieval procedure entirely and answered from parametric memory — its training data. The model knew the answer already and skipped the index.

This isn't a bug. It's a prediction of the theory: parametric memory is a form of indexing that happened during training. When it works, it's even faster than runtime indexing. When it doesn't — when the content is novel, updated, or private — external indexed retrieval is the only option that scales.

The design principle

The results suggest a division of labor. Use the model's parametric knowledge for what it knows well — common facts, linguistic structure, reasoning patterns. Use indexed external memory for everything else — private data, recent events, specialized domains, intermediate reasoning results.

And critically: use semantic models to build the index, but deterministic algorithms to traverse it. The model is good at understanding what something means. It is bad at reliably following a 47-step lookup procedure. Build the tree with the model's judgment. Walk it with code.

Intelligence scales through organization, not parameters. The library is not a metaphor. It is the mechanism.

Mechanically verified in Lean 4 (4 modules, 0 sorry axioms). Experimental validation on GPT-4o-mini and GPT-5.4 across three content types. Full details in "The Library Theorem: Exponential Separation Between Indexed and Sequential Retrieval for Bounded-Capacity Reasoning" (Mainen 2026).