Lazily sweeping the whole Rails page cache

Thijs van der Vossen, 03 Jan 2006, 15:11 in ruby on rails, last updated 13 Sep 2007, 14:47 (edit).

One of the more convenient features in Ruby on Rails is page caching. Simply add caches_page :show to the top of a controller class, and all pages rendered by the show action are written to disk automatically. On subsequent request, these pages will be served straight from disk without invoking Rails at all.

This works because of rewrite rules that basically tell the webserver to append .html to the request path. If the webserver can find a file using the resulting path, the webserver will send it. If not, then Rails will handle the request.

Pages are removed from the cache simply by deleting them from the public directory. Rails provides the expire_page method and sweepers to help with this.

Sweeping is hard

Suppose you are writing a blogging application and you decide to add page caching. When a post is updated, the cached page that shows the post has to be removed. You write a post sweeper for this:

class PostSweeper < ActionController::Caching::Sweeper
  observe Post

  def after_save(post)
    expire_page(:controller => "post", :action => "show", :id => post.id)
  end
end

But wait… Your blog also has a front page listing the most recent posts. The updated post might be included there, so you need to expire that page too.

def after_save(post)
  expire_page(:controller => "post", :action => "show", :id => post.id)
  expire_page(:controller => "post", :action => "index")
end

Then you realize you also have archive pages and category overviews…

def after_save(post)
  expire_page(:controller => "post", :action => "show", :id => post.id)
  expire_page(:controller => "post", :action => "index")
  expire_page(:controller => "archive", :action => "show", 
    :year => post.published_at.year)
  expire_page(:controller => "archive", :action => "show", 
    :year => post.published_at.year, :month => post.published_at.month)
  expire_page(:controller => "archive", :action => "show", 
    :year => post.published_at.year, :month => post.published_at.month, 
    :day => post.published_at.mday)
  post.categories.each do |category|
    expire_page(:controller => "category", :action => "show", 
      :id => category.id)
  end
end

Ok, but what if a post is destroyed? And what exactly should happen when a category is renamed? And…

When you have an application where a single change can invalidate a large number of pages, the sweepers can get quite complex. It’s easy to forget to expire one or more pages, leading to subtle bugs where old pages are served from a stale cache.

An obvious solution to this would be to just sweep all pages after each change. Sadly, this is not possible with page caching because Rails does not keep a list of cached pages. The files are written directly to the public directory, so there’s no way to cleanly delete them all.

Lazy sweeping

We’ve tried to solve the problem of not being sure which files in the public directory are just cached copies and which pages are static html, by moving all cached pages to a public/cache subdirectory. This seems to work fine for us.

In config/environment.rb, change the page cache directory from the default by adding the following line inside the Rails::Initializer.run block.

config.action_controller.page_cache_directory = RAILS_ROOT+"/public/cache/"

Then change the rewrite rules in the webserver configuration. For lighttpd (config/lighttpd.conf) these should be changed to:

url.rewrite = ( 
  "^/$" => "cache/index.html", 
  "^([^.]+)$" => "cache$1.html" )

For Apache (public/.htaccess) the first two rules probably need to be changed to:

RewriteRule ^$ cache/index.html [QSA]
RewriteRule ^([^.]+)$ cache/$1.html [QSA]

We use the following in app/models/site_sweeper.rb as a single sweeper for all the models in our application.

class SiteSweeper < ActionController::Caching::Sweeper
  observe Post, Category

  def after_save(record)
    self.class::sweep
  end
  
  def after_destroy(record)
    self.class::sweep
  end
  
  def self.sweep
    cache_dir = ActionController::Base.page_cache_directory
    unless cache_dir == RAILS_ROOT+"/public"
      FileUtils.rm_r(Dir.glob(cache_dir+"/*")) rescue Errno::ENOENT
      RAILS_DEFAULT_LOGGER.info("Cache directory '#{cache_dir}' fully swept.")
    end
  end
end

Finally assign the site sweeper to all controllers and actions that may invalidate the cache.

cache_sweeper :site_sweeper, :only => [:add, :update, :destroy

We’ve also added the following script as script/sweep_cache to easily sweep the cache during development.

#!/usr/bin/ruby
require File.dirname(__FILE__) + '/../config/boot'
require File.dirname(__FILE__) + '/../config/environment'
SiteSweeper::sweep

This approach can be extended very nicely for the subdomains as account keys pattern. More on that later.

Comments

  1. Packagethief 6 days later: (delete | show email)

    I love it. Frankly, I'm surprised Rails doesn't make it easier to delete the entire cache. Your implementation is simple and elegant. Thanks!

  2. Daniel Sheppard 79 days later: (delete)

    Note that this will break caching of any path that has a period in it (ie, if you dynamically generate images from urls such as /images/test.png). Instead of the rewrite rules above, use:

    RewriteRule ^$ index.html [QSA]
    RewriteRule ^([^.]+)$ $1.html [QSA]
    #RewriteRule ^([^.]+)$ cache/$1.html [QSA]

    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteRule !^cache/(.*) - [C]
    RewriteRule ^(.*)$ cache/$1 [QSA]

  3. Thijs van der Vossen 79 days later: (delete)

    Yes, but you can’t use a condition like that with lighttpd.

  4. Ismael 473 days later: (delete | show email)

    This is great.
    I have a hosted app that supports several accounts, each mapped to its own full domain. My problem is, I need a cache directory for each domain, something like

    /cache/www.one-account.com
    /cache/www.another-account.org

    Do you have any clues as to how to include rewrite rules to dinamically search for static o cached files in those subdirectories?

    I´ve trying things on the line of

    # if there isn't a static file
    RewriteCond cache/%{HTTP_HOST}/%{REQUEST_FILENAME} !-f
    # and there is a cached file
    RewriteCond cache/%{HTTP_HOST}/%{REQUEST_FILENAME}.html -f
    # use the cached file
    RewriteRule ^([^.]+)$ cache/%{HTTP_HOST}/$1.html [QSA,C]

    ...But to no avail.

    Thanks in advance and keep up the great posts

  5. Thijs van der Vossen 473 days later: (delete)

    Ismael, We're using lighttpd for the applications where we have per-domain cache directories. I don't think it will help you find the correct Apache rewrite rules, but our configuration looks like this:

    $SERVER["socket"] == "85.158.201.134:80" {
    $HTTP["host"] =~ "^([^:]*)(:[0-9]+)?$" {
    server.document-root = "/var/www/notwriting/current/public/"
    accesslog.filename = "/var/log/lighttpd/access-notwriting.log"
    server.error-handler-404 = "/dispatch.fcgi"

    url.rewrite-once = ( "^/$" => "cache/%1/index.html", "^([^.]+)$" => "cache/%1$1.html" )

    include "fcgi.d/notwriting.conf"
    }
    }

    Have you tried adding the RewriteLog and RewriteLogLevel directives to your Apache configuration so you can see what's going on with your rewrite rules?

  6. bitbutter 617 days later: (delete | show email)

    Excellent writeup! This has been very useful for me.

    ("fully sweeped" should be "fully swept" ;))

  7. Thijs van der Vossen 617 days later: (delete)

    Thanks bitbutter, It's fixed now.

  8. Nikhil 626 days later: (delete)

    I have created a script that allows time-based expiry of the Rails Page Cache

    http://blogic.www2.aurigalogic.com/nikhil/2007/09/22/time-based-expiry-of-the-rails-page-cache/

  9. mm 848 days later: (delete)

    A rake task instead of a script, put it in: ./lib/tasts/cache.rake

    namespace :cache do
    desc 'Sweep Cache'
    task :sweep => :environment do
    puts "Sweeping cache..."
    SiteSweeper::sweep
    end
    end

  10. Sean 1007 days later: (delete)

    Daniel, your rewrite rules caused Apache to look in /cache for all of my images, javascript, and CSS. This resulted in a bit of a plain looking website!

  11. Patrick Berkeley 1085 days later: (delete)

    If you're using Apache, make sure your Document Root has a slash at the end:
    <Directory "/var/www/application/current/public/">

    Or put slashes before the cache rewrite destination:
    RewriteRule ^$ /cache/index.html [QSA]
    RewriteRule ^([^.]+)$ /cache/$1.html [QSA]

Add your comment

In order to fight spam on this blog, posting comments from a browser without javascript is currently not supported.