# The end of candy‑shop compute: what SMBs should do now

Google’s announcement that Gemini 3.5 Flash costs roughly three times the model it replaced is a clear signal: the era of near‑free, always‑best AI compute is changing. That doesn’t mean AI is broken. It means the economics of running cutting‑edge models are catching up with reality — and small and medium businesses (SMBs) need practical plans to adapt.

I’ve watched this cycle play out: teams chase the fanciest model because it’s on promo or because they want the best output. That’s reasonable for experiments. It’s catastrophic at scale. One café owner I worked with automated customer replies using a top‑tier model during a promotion. Ten thousand messages a month later, the bill looked less like an experiment and more like an open cheque.

If you run an SMB, your AI choices should be business decisions tied to outcomes, not hype. Here’s a pragmatic approach you can implement this week.

## 1) Start with an audit

Before you change models, know your current usage. Log how many calls you make, what tasks those calls serve, and the error or rework rate. Track tokens, latency and failure modes the same way you track sales and margins. Without this visibility you can’t negotiate or optimise.

What to capture:
– Number of model calls per workflow
– Average tokens per call
– Latency and error rates
– Business impact of each workflow (time saved, revenue affected)

## 2) Tier your stack

Not every task needs the top model. Define tiers: cheap models for bulk processing, mid‑range models for quality‑sensitive tasks, and top‑tier models only for high‑value exceptions (legal summaries, high‑stakes customer interactions, nuanced editing).

Mixing models saves money without sacrificing necessary quality. For example, use a cheaper model to generate a first draft, then apply rules and human review before calling the expensive model for final polish only when needed.

## 3) Cache aggressively

If prompts are identical or highly similar, avoid repeated calls. Cache outputs for standard prompts and use local lookups before invoking a model. Caching can slash costs for repeatable flows like templated replies, FAQs and standard data transforms.

## 4) Use human‑in‑the‑loop for edge cases

Humans are cheaper than expensive model calls when volume is low and risk is high. Route edge cases and high‑risk outputs to human reviewers, and automate the rest. This reduces spend while maintaining quality and compliance.

## 5) Measure ROI, don’t worship benchmarks

Higher cost sometimes buys value: faster responses, better accuracy, fewer downstream errors and less human rework. Measure whether improved model performance reduces labour or error costs enough to justify the price. Don’t reflexively buy top models — evaluate cost vs business impact.

## 6) Fix data and workflows before automating

Automation amplifies what you feed it. Clean data, well‑defined prompts and solid workflows reduce token waste and repeated calls. Garbage in means expensive garbage out — and a lot of unnecessary compute spend.

## 7) Negotiate wisely

Providers often offer committed‑use discounts or enterprise plans. You’ll get better terms when you can show predictable usage patterns. Do the audit first, then talk to vendors — not the other way around.

## Practical checklist for this week

1. Enable logging for model calls and token usage.
2. Map each model call to a business outcome and a cost per outcome.
3. Identify high‑volume, low‑value calls to move to cheaper models or local rules.
4. Implement caching for identical prompts.
5. Route edge cases to humans and only use expensive models for genuinely high‑value exceptions.
6. Clean your data and standardise prompts.

## Final thought

Yes, three‑times price increases sting — they’re loud and emotional. But treating every price hike as a trigger to panic‑buy the next model is a business mistake. Instead, use this as a nudge to get smarter about when to spend compute and when to spend a human hour reworking a process.

Do that, and you’ll be the one smiling when your AI bill arrives — not the one hiding behind a locked coffee machine.

Source: [Google’s Gemini 3.5 Flash costs 3x the model it replaced, and the era of cheap AI is ending](https://www.xda-developers.com/google-gemini-3-5-flash-costs-3x-model-replaced-cheap-ai-ending/)

Ready to put this into action?

Book a free 15-minute discovery call and we’ll give you honest, tailored advice for your business.

Book a free call