hpincket - What would the Vim of LLM tooling look like?

The instigating thought "What would the vim of LLMs tooling look like?" led to this rambling post. It strays from the initial question.

Everywhere, companies scramble to create LLM-powered developer tools. The results are impressive (and sometimes useful), but companies ignore tooling that is light, quick, and targeted. Most LLM tools feel heavy. They are ambitious. Many of them are "agentic", which just is a fancy way to say they take longer to run. In practice "agentic" tools call LLMs in a loop. The reason for this is two-fold. Multiple calls to the LLM allow the system to solve more complicated problems. And calling in a loop gives the LLM space to correct some of its errors. (The solution to an LLM is always another LLM.) All this leads to developer tools that disrupt that holy thing, the flow state.

I am interested in tools that encourage flow state. To do that they must be useful, dependable, and fast. Enough ~~blood~~ ink has been spilled on the first two, so I will focus on speed.

Speed

There are three factors that affect LLM developer tool speed:

The time it takes a human to instruct the LLM
The time the LLM takes to run
The time it takes a human to "process" the results/move on to the next step.

LLM instruction feels the same in almost every tool. Open up Cursor, Claude Code, or Aider and you'll explain the change you want to make in a text box. Too much typing in my opinion. There are some exceptions. Autocomplete requires no manual instruction (hence the "auto"). Additionally, there are some useful features for adding context: @-ing files and folders on the chat interface.

As for the time it takes an LLM to run? This is the easy part. Switch to a faster provider or a faster model. Try Cerebras, which boasts 2100 tokens per second. Or try a lighter model like Gemini Flash with 300 tokens per second. But we aren't building tools that use these models. Instead the trend is to always use the frontier models, regardless of latency.

The amount of time the developer spends "processing" the LLM output varies with context. The distinguishing feature of "Vibe Coding" is the developer's conscious decision not to read the LLM output. Sometimes that works, but often it doesn't. When it doesn't, the "processing" step can come to dominate development. This is different from traditional tools, like IDE refactors, which require almost no "processing".

What's the alternative?

Shrink the scope of LLM tools so they are dependable and fast. To give you an idea, I wish I had...

A fuzzy find and replace feature that requires minimal typing always works.
Automatic MyPy typing after mypy complains about missing types.
Linting errors that are auto-fixed in less than a second.
Pre-configured non-mechanical refactor patterns that always work. What if I could reliably refactor code in more ways without having to describe what I wanted each time and without having to check output?
Faster CLI command construction at the terminal. We already have these features, but why do I have to wait seconds for the LLM to read the man pages and select the result? It should be instant.

Why are we here?

Structural forces discourage this sort of developer tool. AI's raison d'être is to replace human workers with automation. (You didn't think all that money was invested for love of AGI, did you?) As a result, companies selling AI developer tools benefit more from larger visions. There's more money in rethinking the industry than optimizing existing workflows.