I don't like caches

November 06, 2021

I don’t like caches

In fact I do, but not for everything. I like caching for immutable resources and for side effects free computation. I don’t like caching for mutable resources and computations.

Let me explain why.

Cache for immutable resources

At MyCoach when we rebuilt our storage service we chose not to support updates.

When a file is uploaded it is assigned a unique UUID, this UUID will be part of the link to a file for it’s whole lifetime. If you need to update a file you’ll perform a delete, reupload a new file, get a new link, and update the resource with a reference to your file with a new link.

By not supporting updates we were able to enable long term caching on these links, enabling long term caching meant that we were able to cache all those links at the edge (e.g: the request does’nt reach our servers, cloudflare replies on our behalf).

This single choice of not supporting updates had a tremendous impact when we launched our vod platform. It allowed reducing the number of request received by our servers by 50% and allowed to improve the user experience a lot, nobody likes to wait a minute for a site/app to load.

Cache for pure computations

A pure computation is a computation which given the same input will always produce the same output. A pure computation holds a strong contract, since we have the guarantee that the result won’t change if the input don’t. A pure computation can memoized, the result can be stored in memory avoiding to actually perform the computation more than once (e.g: Map: Input → Output).

We use this technique a lot on our MyCoach Pro product. This product has a feature named « performance BI » where we do a lot of computations to enable longitudinal follow up of various player markers. With this technique we were able to improve the user experience by keeping response times as fast as possible.

Cache for mutable things

Let’s talk about caching for mutable things. If you cache data coming out of your database you are taking a risk. The risk of having a stale cache.

Each time a record change in the database you’ll have to invalidate your cache. It might seem easy and you can setup everything properly in the first place. But what if 6 months from now some other developer comes after you and forgot to invalidate the cache in some scenarios ? You’ll end up with a stale cache and probably unsatisfied users.

So what do you do ?

Question yourself, do I need a cache ? Can I add an index to improve a slow query ? Can I build the schema some other way to make things faster ? Can I use a different database system for my particular use case ?

We removed cache and avoided a lot of bugs just by adding indexes or using the proper database system for the task at hand.

A cache is a second source of truth and you need to keep in sync your principal source of truth with your caching system. At MyCoach we try to avoid having multiple source of truths to avoid synchronisation between them.

What if you still need to add a cache ?

Let’s say you have a rest API with GET / PUT endpoints.

You might want to:

  • Check cache on GET

    • If there’s a cache return data from cache
    • If there’s no cache fetch data from db, return data from db and add data to cache
  • Invalidate cache on PUT

What if 1 month from now some other developer add a DELETE endpoint and forgot to invalidate cache on this endpoint ? You name it: Stale Cache

Rely on your source of truth

We try to avoid managing cache at the endpoint level, what we do instead is hook on events emitted by our database and update/invalidate cache on it.

Each time a record is added/updated/remove we update the cache to replicate the new state. This way we reduce to the minimum the risk of a stale cache.

If you want to know how to hook to your database refer to this article: Reacting to change with Change Data Capture.

Conclusion

Caches are here to fasten response times, to improve user experience. Caches must be used as a last resort. You must try your best and make things fast in the first place.

If you need cache you must try to use immutable resources and pure computation to be free of cache invalidation and stale cache issues.

If you need cache for mutable data or computations you must be very careful with cache invalidation and must as much as possible manage your cache by relying on your source of truth.


Hi 👋
I'm Clément Agarini a software engineer working for MyCoach . I've been building software for the past 13 years.
You can follow me on Twitter