Everyone's Optimizing Prompts. Builders Are Training Models.

Better benchmarks, higher costs: the AI provider playbook exposed

May 29, 2026

When the first vehicles hit the road, many people scoffed and derided them as loud, dirty, and unreliable. Meanwhile, these same people were walking through ankle-deep piles of horse poop because that was the means of transportation back then. Nice.

I feel like we’re in that era now. Most people are flaming each other over em dashes or AI-assisted writing. They’ll get into long note wars to defend their position. Meanwhile, the actual technology and problem-solving method behind it is quietly and drastically shifting due to rising token costs.

AI Evolution and the Builder’s Dilemma

This is going to ruffle some feathers. Most of you have fallen for the sleight of hand the AI providers have pulled. The message was clear: your jobs are going to be lost to AI, and if you want to perform in the future, you’ll pay us for use of our ‘intelligence’. Except that is not truly the case. In fact, it can’t be the case. A massive percentage of the world still has no reliable access to these AI solutions.

As demand and access ramp up globally, the infrastructure can’t keep up. This isn’t conjecture—it’s a well-known fact. One Dutch company supplies the machines that can produce the highest-end AI chips. A few chip manufacturers have those machines. A few companies buy those chips and bundle them into their own products and sell the final hardware. And don’t get me started on the energy, resource, and frankly political constraints on datacenters.

The AI Provider Head Fake

Anthropic and OpenAI have done an incredible job getting people to focus on whatever it is they want people to focus on. A new model version comes out, and people have prompt kits, videos, articles, and evals coming out their ears in minutes. The fanfare and content on every new AI model release remind me of tweenage girls when Justin Bieber started his career. I don’t blame anyone, but this messaging is exactly what the AI providers want: the praise, the benchmarks, the press, the articles, the content.

All the while glossing over the fact that they are raising now because internally at these big companies, people are already murmuring the economic situation is going to get tight. Raise now, increase your valuation for internal shareholders, and lock it in while the market settles. And it’s not just 1 billion raised it’s tens of billions.

But what if it’s more than just savvy strategy to fund an opportunistic round as economic uncertainty begins?

AI Providers Can Already See the Future

With great power comes great responsibility. Haiku to Opus 4.8, GPT-4 to 5.5 the token cost difference is staggering. Doing the same workflow in Haiku vs. Opus 4.8 is 5× the cost: $5.00 vs. $1.00 per million (input tokens), $25.00 vs. $5.00 per million (output tokens). And the AI providers are not done yet. Since everyone is focusing on aggregate benchmarks and improvement, these models will get more ‘capable’ and more expensive. These next-gen models are going to make the AI infrastructure bottleneck and token cost even worse, and they are doing everything they can to keep you focused on the ‘next big thing’ they are working on. Meanwhile, token cost and usage increase with every new model, and many people don’t even notice.

If you are using the latest AI model for everything, you can absolutely use lower-level models like Haiku wherever possible. If the task is constrained, repeatable, deterministic, and high-volume, a lower-level model is almost always worth it.

Out of the Stone Ages and into AI Builder Prosperity and Resurgence

The builder of the future will be the one who can identify what model, at what size, at what budget, suits a particular workflow best. Downgrading the majority of a client’s workflows to Haiku from Opus or even Sonnet and saving them a ton of money is good. Finding an optimized and minimized micro model for their workflows is even better.

Micro models are smaller models with smaller parameter sets trained on fewer things. I’m not sure what the official definition will end up being, but for me it’s really anything ~2 billion parameters or less. These micro models can be trained on and specialize in one particular thing and nothing else. The cost if you self-host? Compute, electricity, wear and tear on the hardware. Cost if you use hosted micro models? Pennies per million tokens in/out. Gemma 2B, for example, is $0.04 per million tokens. Just by finding pre-existing micro models and applying them to your workflows, you’ve saved a massive amount.

AI Builders: The Next Generation

The mega models will always have their place, but it will be only one part of a larger pie. True grassroots builders on shoestring budgets will learn how to fine-tune micro models for their own unique problem set. Using QLoRA or LoRA to pick out and adjust what they want without having to retrain the entire parameter set. The cost for this is usually <$100, and now you’ve got the option to self-host. The massive model is swapped, and you are saving hundreds or thousands a month.

The kicker? Your optimized micro model will perform much better than the mega model it replaces. It is literally purpose-tuned for your workflow’s needs.

My Own Future

I’m looking at my own dependence on the mega models and realizing it would be a huge pain in the butt for me to get everything ported over to a new model if massive price increases or failures occurred. So I’m looking at my workflows to actively determine which of them are the best to experiment with, and then I want to get a micro model of my own optimized. From there, run the mega model and micro model workflows in parallel. See how it goes. Many real-world use cases have emerged (especially at the edge) where micro models are vastly superior per task. I want to replicate those results for myself. Stay tuned.

Discussion about this post

Ready for more?