Lazily sweeping the whole Rails page cache
One of the more convenient features in Ruby on Rails is page caching. Simply add caches_page :show to the top of a controller class, and all pages rendered by the show action are written to disk automatically. On subsequent request, these pages will be served straight from disk without invoking Rails at all.
This works because of rewrite rules that basically tell the webserver to append .html to the request path. If the webserver can find a file using the resulting path, the webserver will send it. If not, then Rails will handle the request.
Pages are removed from the cache simply by deleting them from the public directory. Rails provides the expire_page method and sweepers to help with this.
Sweeping is hard
Suppose you are writing a blogging application and you decide to add page caching. When a post is updated, the cached page that shows the post has to be removed. You write a post sweeper for this:
class PostSweeper < ActionController::Caching::Sweeper
observe Post
def after_save(post)
expire_page(:controller => "post", :action => "show", :id => post.id)
end
end
But wait… Your blog also has a front page listing the most recent posts. The updated post might be included there, so you need to expire that page too.
def after_save(post)
expire_page(:controller => "post", :action => "show", :id => post.id)
expire_page(:controller => "post", :action => "index")
end
Then you realize you also have archive pages and category overviews…
def after_save(post)
expire_page(:controller => "post", :action => "show", :id => post.id)
expire_page(:controller => "post", :action => "index")
expire_page(:controller => "archive", :action => "show",
:year => post.published_at.year)
expire_page(:controller => "archive", :action => "show",
:year => post.published_at.year, :month => post.published_at.month)
expire_page(:controller => "archive", :action => "show",
:year => post.published_at.year, :month => post.published_at.month,
:day => post.published_at.mday)
post.categories.each do |category|
expire_page(:controller => "category", :action => "show",
:id => category.id)
end
end
Ok, but what if a post is destroyed? And what exactly should happen when a category is renamed? And…
When you have an application where a single change can invalidate a large number of pages, the sweepers can get quite complex. It’s easy to forget to expire one or more pages, leading to subtle bugs where old pages are served from a stale cache.
An obvious solution to this would be to just sweep all pages after each change. Sadly, this is not possible with page caching because Rails does not keep a list of cached pages. The files are written directly to the public directory, so there’s no way to cleanly delete them all.
Lazy sweeping
We’ve tried to solve the problem of not being sure which files in the public directory are just cached copies and which pages are static html, by moving all cached pages to a public/cache subdirectory. This seems to work fine for us.
In config/environment.rb, change the page cache directory from the default by adding the following line inside the Rails::Initializer.run block.
config.action_controller.page_cache_directory = RAILS_ROOT+"/public/cache/"
Then change the rewrite rules in the webserver configuration. For lighttpd (config/lighttpd.conf) these should be changed to:
url.rewrite = (
"^/$" => "cache/index.html",
"^([^.]+)$" => "cache$1.html" )
For Apache (public/.htaccess) the first two rules probably need to be changed to:
RewriteRule ^$ cache/index.html [QSA]
RewriteRule ^([^.]+)$ cache/$1.html [QSA]
We use the following in app/models/site_sweeper.rb as a single sweeper for all the models in our application.
class SiteSweeper < ActionController::Caching::Sweeper
observe Post, Category
def after_save(record)
self.class::sweep
end
def after_destroy(record)
self.class::sweep
end
def self.sweep
cache_dir = ActionController::Base.page_cache_directory
unless cache_dir == RAILS_ROOT+"/public"
FileUtils.rm_r(Dir.glob(cache_dir+"/*")) rescue Errno::ENOENT
RAILS_DEFAULT_LOGGER.info("Cache directory '#{cache_dir}' fully swept.")
end
end
end
Finally assign the site sweeper to all controllers and actions that may invalidate the cache.
cache_sweeper :site_sweeper, :only => [:add, :update, :destroy
We’ve also added the following script as script/sweep_cache to easily sweep the cache during development.
#!/usr/bin/ruby
require File.dirname(__FILE__) + '/../config/boot'
require File.dirname(__FILE__) + '/../config/environment'
SiteSweeper::sweep
This approach can be extended very nicely for the subdomains as account keys pattern. More on that later.
Comments
Add your comment
In order to fight spam on this blog, posting comments from a browser without javascript is currently not supported.
Subscribe
Packagethief 6 days later: (delete | show email)
I love it. Frankly, I'm surprised Rails doesn't make it easier to delete the entire cache. Your implementation is simple and elegant. Thanks! ¶
Daniel Sheppard 79 days later: (delete)
Note that this will break caching of any path that has a period in it (ie, if you dynamically generate images from urls such as /images/test.png). Instead of the rewrite rules above, use:
RewriteRule ^$ index.html [QSA]
RewriteRule ^([^.]+)$ $1.html [QSA]
#RewriteRule ^([^.]+)$ cache/$1.html [QSA]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule !^cache/(.*) - [C]
RewriteRule ^(.*)$ cache/$1 [QSA] ¶
Thijs van der Vossen 79 days later: (delete)
Yes, but you can’t use a condition like that with lighttpd. ¶
Ismael 473 days later: (delete | show email)
This is great.
I have a hosted app that supports several accounts, each mapped to its own full domain. My problem is, I need a cache directory for each domain, something like
/cache/www.one-account.com
/cache/www.another-account.org
Do you have any clues as to how to include rewrite rules to dinamically search for static o cached files in those subdirectories?
I´ve trying things on the line of
# if there isn't a static file
RewriteCond cache/%{HTTP_HOST}/%{REQUEST_FILENAME} !-f
# and there is a cached file
RewriteCond cache/%{HTTP_HOST}/%{REQUEST_FILENAME}.html -f
# use the cached file
RewriteRule ^([^.]+)$ cache/%{HTTP_HOST}/$1.html [QSA,C]
...But to no avail.
Thanks in advance and keep up the great posts ¶
Thijs van der Vossen 473 days later: (delete)
Ismael, We're using lighttpd for the applications where we have per-domain cache directories. I don't think it will help you find the correct Apache rewrite rules, but our configuration looks like this:
$SERVER["socket"] == "85.158.201.134:80" {
$HTTP["host"] =~ "^([^:]*)(:[0-9]+)?$" {
server.document-root = "/var/www/notwriting/current/public/"
accesslog.filename = "/var/log/lighttpd/access-notwriting.log"
server.error-handler-404 = "/dispatch.fcgi"
url.rewrite-once = ( "^/$" => "cache/%1/index.html", "^([^.]+)$" => "cache/%1$1.html" )
include "fcgi.d/notwriting.conf"
}
}
Have you tried adding the RewriteLog and RewriteLogLevel directives to your Apache configuration so you can see what's going on with your rewrite rules? ¶
bitbutter 617 days later: (delete | show email)
Excellent writeup! This has been very useful for me.
("fully sweeped" should be "fully swept" ;)) ¶
Thijs van der Vossen 617 days later: (delete)
Thanks bitbutter, It's fixed now. ¶
Nikhil 626 days later: (delete)
I have created a script that allows time-based expiry of the Rails Page Cache
http://blogic.www2.aurigalogic.com/nikhil/2007/09/22/time-based-expiry-of-the-rails-page-cache/ ¶
mm 848 days later: (delete)
A rake task instead of a script, put it in: ./lib/tasts/cache.rake
namespace :cache do
desc 'Sweep Cache'
task :sweep => :environment do
puts "Sweeping cache..."
SiteSweeper::sweep
end
end ¶