Lazily sweeping the whole Rails page cache

Thijs van der Vossen

One of the more convenient features in Ruby on Rails is page caching. Simply add caches_page :show to the top of a controller class, and all pages rendered by the show action are written to disk automatically. On subsequent request, these pages will be served straight from disk without invoking Rails at all.

This works because of rewrite rules that basically tell the webserver to append .html to the request path. If the webserver can find a file using the resulting path, the webserver will send it. If not, then Rails will handle the request.

Pages are removed from the cache simply by deleting them from the public directory. Rails provides the expire_page method and sweepers to help with this.

Sweeping is hard

Suppose you are writing a blogging application and you decide to add page caching. When a post is updated, the cached page that shows the post has to be removed. You write a post sweeper for this:

class PostSweeper < ActionController::Caching::Sweeper
  observe Post

  def after_save(post)
    expire_page(:controller => "post", :action => "show", :id => post.id)
  end
end

But wait… Your blog also has a front page listing the most recent posts. The updated post might be included there, so you need to expire that page too.

def after_save(post)
  expire_page(:controller => "post", :action => "show", :id => post.id)
  expire_page(:controller => "post", :action => "index")
end

Then you realize you also have archive pages and category overviews…

def after_save(post)
  expire_page(:controller => "post", :action => "show", :id => post.id)
  expire_page(:controller => "post", :action => "index")
  expire_page(:controller => "archive", :action => "show", 
    :year => post.published_at.year)
  expire_page(:controller => "archive", :action => "show", 
    :year => post.published_at.year, :month => post.published_at.month)
  expire_page(:controller => "archive", :action => "show", 
    :year => post.published_at.year, :month => post.published_at.month, 
    :day => post.published_at.mday)
  post.categories.each do |category|
    expire_page(:controller => "category", :action => "show", 
      :id => category.id)
  end
end

Ok, but what if a post is destroyed? And what exactly should happen when a category is renamed? And…

When you have an application where a single change can invalidate a large number of pages, the sweepers can get quite complex. It’s easy to forget to expire one or more pages, leading to subtle bugs where old pages are served from a stale cache.

An obvious solution to this would be to just sweep all pages after each change. Sadly, this is not possible with page caching because Rails does not keep a list of cached pages. The files are written directly to the public directory, so there’s no way to cleanly delete them all.

Lazy sweeping

We’ve tried to solve the problem of not being sure which files in the public directory are just cached copies and which pages are static html, by moving all cached pages to a public/cache subdirectory. This seems to work fine for us.

In config/environment.rb, change the page cache directory from the default by adding the following line inside the Rails::Initializer.run block.

config.action_controller.page_cache_directory = RAILS_ROOT+"/public/cache/"

Then change the rewrite rules in the webserver configuration. For lighttpd (config/lighttpd.conf) these should be changed to:

url.rewrite = ( 
  "^/$" => "cache/index.html", 
  "^([^.]+)$" => "cache$1.html" )

For Apache (public/.htaccess) the first two rules probably need to be changed to:

RewriteRule ^$ cache/index.html [QSA]
RewriteRule ^([^.]+)$ cache/$1.html [QSA]

We use the following in app/models/site_sweeper.rb as a single sweeper for all the models in our application.

class SiteSweeper < ActionController::Caching::Sweeper
  observe Post, Category

  def after_save(record)
    self.class::sweep
  end
  
  def after_destroy(record)
    self.class::sweep
  end
  
  def self.sweep
    cache_dir = ActionController::Base.page_cache_directory
    unless cache_dir == RAILS_ROOT+"/public"
      FileUtils.rm_r(Dir.glob(cache_dir+"/*")) rescue Errno::ENOENT
      RAILS_DEFAULT_LOGGER.info("Cache directory '#{cache_dir}' fully swept.")
    end
  end
end

Finally assign the site sweeper to all controllers and actions that may invalidate the cache.

cache_sweeper :site_sweeper, :only => [:add, :update, :destroy

We’ve also added the following script as script/sweep_cache to easily sweep the cache during development.

#!/usr/bin/ruby
require File.dirname(__FILE__) + '/../config/boot'
require File.dirname(__FILE__) + '/../config/environment'
SiteSweeper::sweep

This approach can be extended very nicely for the subdomains as account keys pattern. More on that later.