# Why shared KV caches matter — and why they can go wrong

My upfront take: sharing KV caches between agents is a sensible, practical way to cut needless compute and cloud bills — but only if you fix the fundamentals first. If your documents are messy, access controls are loose or model versions are inconsistent, a shared cache will amplify those problems instead of improving things.

I read the arXiv note “Can I Buy Your KV Cache?” and it points out an embarrassingly modern inefficiency: we’ve built legions of agents that each re-read (and re-prefill) the same document. Prefill — the work of turning text into model keys and values — is costly. It’s like sending ten people to photocopy the same page and charging each for paper and toner.

A real client example helps. A bookkeeping shop in Melbourne ran several assistants that repeatedly accessed the same invoices and client notes. Each interaction recomputed the prefill step and their cloud bill ballooned. Moving to a shared KV cache inside their trusted environment reduced per-query compute by a huge slice overnight. It wasn’t magic — it was sensible engineering, measurement, and a willingness to fix the basics.

# The risks you must consider

A shared KV cache isn’t a silver bullet. Consider these real trade-offs:

– Privacy and access control: who is allowed to hit the cache and see results? Without strict scoping, you risk leaking snippets across contexts.
– Staleness and invalidation: how do you detect updates and remove stale entries? Bad TTLs or missing invalidation logic cause incorrect answers.
– Model and tokenizer mismatch: a cache filled for one model or tokenizer may not interoperate with another, producing wrong keys or no hit at all.
– Cache poisoning and integrity: untrusted writes or poorly scoped caches can inject incorrect or malicious entries.
– Storage vs compute economics: keys cost storage. For rare documents it can be cheaper to recompute than to persist keys indefinitely.

# Practical, minimal-risk steps to implement shared KV caching

Before you introduce a shared cache, do the fundamentals well:

1. Clean your source documents
– Normalise formatting, remove duplicates, and canonicalise templates (invoices, contracts, specs). Clean inputs mean higher cache hit rates and fewer surprises.

2. Lock down access and trust boundaries
– Keep caches inside a single application, client tenant, or team. Don’t start with a global cache across multiple customers.

3. Detect identical content with stable hashing
– Hash content to identify identical inputs. A stable content hash is the simplest, highest-confidence cache key.

4. Tag entries with model & tokenizer metadata
– Store model and tokenizer versions with each cache entry. Reject hits for mismatched versions or apply compatibility rules.

5. Set sensible TTLs and invalidate on updates
– Use conservative TTLs for mutable docs. When a source document changes, invalidate related cache entries immediately.

6. Encrypt keys at rest and audit access
– Treat KV entries as sensitive data: encrypt, log requests, and monitor who requests cache hits.

7. Instrument everything
– Measure hit rate, compute saved, latency impact, and edge cases where cached responses were stale or incorrect. Metrics drive decisions.

# Start small, measure, iterate

Don’t buy someone else’s cache marketplace or throw a global cache in front of every agent. Start with high-frequency, read-heavy documents — standard contracts, product specs, recurring invoices. Implement the steps above in a single use case. Track the economics: how much compute and cost did you save? What percentage of queries are served from cache? Where did cached results fail?

If the numbers justify expansion, grow the cache footprint incrementally and keep the same operational guardrails.

# Bottom line

The arXiv note is a helpful poke: we’re wasting cycles. For most SMBs the smart play is modest, controlled caching in places that already burn CPU today. Do the basics first — clean your data, standardise model versions, lock permissions — then add a shared KV cache where it makes sense. Tinker, measure, and keep your data tidy; when you do it right, the cache becomes bookkeeping that saves money and latency. Do it badly, and it’s a new source of bugs.

Source: [Can I Buy Your KV Cache?](https://arxiv.org/abs/2606.13361)

Ready to put this into action?

Book a free 15-minute discovery call and we’ll give you honest, tailored advice for your business.

Book a free call

When Shared KV Caches Make Sense: A Practical Guide for SMB AI Agents

Ready to put this into action?