What if AI Gets Faster Instead of Smarter?

One of my favorite things about working at OutRival is the quality of our lunch debates. They have a way of starting with a technical detail and ending somewhere much stranger.

This week, the spark was Taalas, a Toronto-based startup that recently showcased an Llama-8B model running on specialized silicon at more than 17,000 tokens per second.

There is something remarkable about that feeling of instantaneous intelligence. The demo matters not only because it is fast, but because it hints at a new timeline. If an 8B model can run this quickly today, it is only a matter of time before frontier-level models run at similar speeds. Imagine a model like Fable 5 responding as you finish asking.

And 17,000 tokens per second is likely just the beginning.

What happens at 100,000 tokens per second? What happens at 1 million?

The debate we kept circling was simple: would the world get more value from a frontier model running at impossible speed, or from a frontier model that is 100 times smarter than today's best systems?

This post is a thought experiment about the first path.

Escaping the scarcity of thought

The major AI labs have spent the last several years chasing capability. The scaling laws are familiar by now: more data + more compute + more parameters = better performance. The assumption is that the main thing we want from artificial intelligence is a higher-quality answer.

That assumption is reasonable, but perhaps incomplete.

Today's best models are powerful, but they are slow. They are expensive enough that we still need to treat each prompt as a meaningful unit of computation. Even when the interface feels conversational, there is a scarcity behind it. You ask a question, wait, receive an answer, judge it, and maybe ask again.

This shapes how we think about intelligence. We imagine intelligence as the ability to produce one excellent thought.

But what if that is only true because thought is still scarce?

Imagine freezing today's frontier intelligence and scaling only inference speed to 1 million tokens per second. At that speed, the model is no longer producing a single answer in human time.

It can generate thousands of strategies, test competing framings, explore entire solution spaces, simulate objections, draft alternatives, rewrite itself, and search for weaknesses before a human would finish reading the first paragraph of this post.

The question changes from "Can the model produce the best answer?"

To, "What happens when the model can produce nearly every plausible answer?"

The Borges problem

Historically, humans have defined intelligence through the struggle of thought. Intelligence is the labor required to cross the dark space between a problem and its solution. We respect the process because the process is costly. It takes time, energy, taste, memory, attention, and nerve.

Ultrafast inference collapses that friction. If a system can generate the full spectrum of possible responses in a few seconds, thinking stops looking like a journey. AI no longer acts like a single mind carefully moving through an idea, but rather closer to weather. The machine manifests possibility.

This is the Borges problem. In The Library of Babel, every possible book already exists: every truth, lie, discovery, biography, and contradiction. The library contains everything, which means the act of writing becomes almost meaningless. The real problem is finding.

That is the first strange implication of ultrafast intelligence: generation itself loses status.

When a machine can generate a million thoughts in an instant, the mere creation of a thought is no longer the scarce act. Generation is physics, and the hard part becomes contact with the world.

A thought by itself is cheap. A thought that survives pressure is something else.

In this world, intelligence is less about the private experience of having an idea and more about the rate at which ideas can be exposed to reality.

Intuition vs brute force

The second implication is more provocative: ultrafast intelligence weakens the importance of code as we know it.

Right now, we still treat programming languages as central. We ask models to write Python, TypeScript, or SQL because code is the bridge between intention and execution. But this may be a temporary phase.

Our obsession with making AI smarter is, in part, an attempt to make it better at imitating human intuition. We want the model to have the "aha" moment. We want it to write the elegant script on the first try, to reason its way toward the right abstraction.

In Borges' poem The Golem, he writes about the creature that can mimic human words but lacks the human soul behind them. That is a useful contrast for ultrafast intelligence. For most of the AI era, progress has meant making models better at deliberate human intuition. But speed suggests another path: it replaces intuition with brute force.

At 1 million tokens per second, the system becomes more alien rather than more human, and perhaps that is more useful.

A model that can generate, run, evaluate, and revise a million candidate programs in seconds, does not need to write the perfect program on the first try.

The intermediate artifact we call “code” starts to disappear into the machine.

Action as information

The slow, 100x smarter AI is an oracle. You bring it a grand question and wait for a pristine answer. It favors institutions that can afford patience, compute, control, and risk management.

Ultrafast intelligence rewards builders because action produces information.

A plan contains assumptions. An action collides with the world and returns evidence. The test fails. The latency spikes. The prototype feels wrong in the hand. None of that information exists while the idea remains untouched in the abstract.

This is why speed matters. Faster action means faster information. The model that can generate and execute thousands of attempts is increasing the surface area of contact between the idea and reality.

In the oracle timeline, the winner is the person with access to the smartest answer. In the ultrafast timeline, the winner is the person who can move the fastest.

If trying was free

Maybe the most interesting version of the future is not a superintelligent AI answering from above. Maybe it is a merely “good enough” AI made so fast and cheap that intelligence becomes this ubiquitous thing we can spend recklessly.

A world where AI becomes an engine of brute force reality testing.

If trying becomes free, does intelligence still mean having the best answer?