Linked Cache Invalidation could be said to be an instance of a more general solution type one might refer to as Cache Object Dependency Invalidation. This blog post assumes some familiarity with LCI and is meant to explain the impetus behind the creation of a similar but alternate solution of the same type which I refer to as Tagged Cache Invalidation.
Background and History
In 2010, Mark’s 2006 Internet Draft http-link-header was accepted as RFC 5988 after 4 years and 10 revisions.
Around the same time, another brilliant engineer named Mike Kelly from the UK submitted a paper to WS-REST for a presentation detailing a use for Mark’s Link header in a cache invalidation technique he named Link Header Cache Invalidation.
Unbeknownst to Mike, Mark was already working on an implementation of a virtually identical concept.
Mike and Mark soon joined forces to co-author an Internet Draft formally specifying the new cache invalidation technique they named Linked Cache Invalidation which is presently on it’s 4th draft.
Mark’s implementation of this technique — squid-lci — has at least one very large scale production installation but exhibits a number of qualities which may make it unsuitable for widespread adoption :
- squid-lci requires squid 2.8 for which a stable version has still not been released and thus is not recommended for use in production.
- squid-lci is not forward compatible with squid 3 making it a poor candidate for adoption for users who wish to retain the option to upgrade squid in the future.
- squid-lci makes inappropriate use of squid’s log daemon.
Putting squid-lci through it’s paces in a side project I hacked together in 2011 between jobs convinced me that LCI was indeed a viable technique.
Houston, we have a problem.
At Beatport we make heavy use of http caching for published content and after a year as an application developer I received a user story for which LCI seems it might provide a good solution:
As a DJ, I can edit the title, description, event association, and track times of my mix after it has been published so I can fix errors in my published content.
It so happens that the day after scheduling this user story for the coming sprint, I boarded a plane for Greenville, South Carolina to attend RestFest, a barcampy unconference meeting of the minds for all things REST. The timing would turn out to be highly serendipitous.
RestFest
After a daylong hackathon, a bit of listening to presentations, a bit of presenting, a great deal of excellent southern cooking and more than a few beers with familiar like-minded folks, we were all in the zone.
During the course of a pleasant side conversation with Leonard Richardson, I remembered the problem and presented it for inspection to get a second opinion on whether LCI might be a valid solution. Leonard indicated to me that he was familiar with LCI and that it did indeed seem as though it might present a good solution to the problem.
Having no more slides to prepare as I had already given my lightning talk, I set out immediately to create a viable production ready technical implementation of Linked Cache Invalidation in Varnish (Beatport’s RPC of choice).
I scoured the net researching VCL to determine all available cache invalidation mechanisms in Varnish 3. Purge and Ban seemed they might do the trick.
The day was soon over so we all went out to dinner and crashed a local incubator drinkup.
Brick Walls
The next day I returned to the problem and quickly ran into a number of challenges:
- Accessing values for repeated headers in Varnish does not work well, if at all. VCL appears to assume the value of the first instance of each header and ignore the rest.
- Varnish comes pre-configured with a maximum header count limit of 64, which may be insufficient depending on whether the invalidation links are distributed across many headers. Not a show stopper but it could present problems.
- Varnish comes pre-configured with a maximum header length of 4096 characters. This could become a problem depending on the size and number of urls by which the application’s resources must be invalidated. At this point I am operating on the assumption that to function correctly for our application, the implementation must comfortably support at least 200 dependencies per object without significant modification to the default varnish settings.
- Finding and iterating over urls in invalidates headers is non-trivial since VCL is more of a configuration DSL than a real programming language. Implementing support for LCI’s rel=”invalidates” link headers will definitely require some programming in C, a language with which I am not terribly familiar.
- Invalidating resources across subdomains would require absolute URIs which could add significantly to the length of the required Link headers.
Given these constraints, I came to the following conclusions :
- I would need to cut a few corners in order to create a viable solution to the problem at hand
- LCI’s link header format is too verbose to comfortably fit 200 dependencies into 4096 characters
- I would need 2 different headers, one for dependencies and one for invalidations
- The invalidates header value would have to be a regex to avoid coding in C
Success
After a few hours of coding I had a concise, viable solution.
tci.vcl
sub vcl_fetch {
if (beresp.status >= 200 && beresp.status < 400
&& (req.request == "PUT" || req.request == "POST" || req.request == "DELETE")) {
ban("obj.http.x-invalidated-by ~ " + beresp.http.x-invalidates);
}
}
The test app worked correctly and seemed quite efficient
a.php
<?php
header('x-invalidated-by: mix-a,track-b', false);
header('cache-control: s-maxage=86400');
if ($_SERVER['REQUEST_METHOD'] == 'POST') {
header('x-invalidates: mix-a', false);
}
?>
<h1>A is invalidated by B</h1>
b.php
<?php
header('x-invalidated-by: track-b', false);
header('cache-control: s-maxage=86400');
if ($_SERVER['REQUEST_METHOD'] == 'POST') {
header('x-invalidates: track-b', false);
}
?>
<h1>Changes to B invalidate both B and A</h1>
Advantages
- It effectively solves the problem at hand
- It’s fairly efficient
- It’s extremely concise
- It handles cross sub-domain cache object invalidation with no additional overhead
Drawbacks
- It uses custom http headers
- It’s only suitable for use in reverse proxy caches (equally true of LCI)
- It doesn’t solve the problem of out of band changes (equally true of LCI)
- Ban propagation for peered cache instances will require an out of band component with knowledge of all instances
Conclusion
Thank you for taking the time to read this post.
I’m hoping to deploy Tagged Cache Invalidation to production at Beatport in the next 3-6 weeks.
If you see any problems with this technique, please speak up. I want to hear what you think.
If this technique interests you, please pass this link to co-workers and acquaintances for comment.
I’m trying to gather as much feedback as possible concerning the viability and utility of this technique.