What would the Vim of LLM tooling look like?

The instigating thought "What would the vim of LLMs tooling look like?" led to this rambling post. It strays from the initial question.

Everywhere, companies scramble to create LLM-powered developer tools. The results are impressive (and sometimes useful), but there is a blind spot in their feature set. Most LLM tools feel heavy. They are ambitious. Many of them are "agentic", which just is a fancy way to say they take longer to run. In practice "agentic" tools call LLMs in a loop. The reason for this is two-fold. Multiple calls to the LLM allow the system to solve more complicated problems. And calling in a loop gives the LLM space to correct some of its errors. (The solution to an LLM is always another LLM.) All this leads to developer tools that disrupt that holy thing, the flow state.

I am interested in tools that encourage flow state. To do that they must be useful, dependable, and fast. Enough blood ink has been spilled on the first two, so I will focus on speed.

Speed

There are three factors that affect LLM developer tool speed:

  1. The time it takes a human to instruct the LLM
  2. The time the LLM takes to run
  3. The time it takes a human to "process" the results/move on to the next step.

LLM instruction feels the same in almost every tool. Open up Cursor, Claude Code, or Aider and you'll explain the change you want to make in a text box. Too much typing in my opinion. There are some exceptions, autocomplete requires no manual instruction (hence the "auto"). Additionally, there are some useful features for adding context: @-ing files and folders on the chat interface.

As for the time it takes an LLM to run? This is the easy part. Switch to a faster provider or a faster model. Try Cerebras, which boasts 2100 tokens per second. Or try a lighter model like Gemini Flash with 300 tokens per second. But we aren't building tools that use these models. Instead the trend is to always use the frontier models regardless of latency.

For the final factor, the amount of time the developer spends "processing" the LLM output varies with context. The distinguishing feature of "Vibe Coding" is the developer's conscious decision not to read the LLM output. Sometimes that works, but often it doesn't. When it doesn't, the "processing" step can come to dominate development. This is different from traditional tools, like IDE refactors, which require almost no "processing".

What's the alternative?

Shrink the scope of LLM tools so they are dependable and fast. To give you an idea, I wish I had...

Why are we here?

Structural forces discourage this sort of developer tool. AI's raison d'ĂȘtre is to replace human workers with automation. (You didn't think all that money was invested for love of AGI, did you?) As a result, companies selling AI developer tools benefit more from larger visions. There's more money in rethinking the industry than optimizing existing workflows.