URoR 1: Set the Content-Type

Thijs van der Vossen

URoR stands for ‘Unicode Ruby on Rails’ which is going to be a series on using Unicode with Rails. In this first article I’ll show you how to set the Content-Type header so that the browser knows what you’re sending. (second article)

Set it in an after filter

On the web, the One And Only Sensible Encoding for Unicode is UTF-8, so that’s what we’re going to use. First, make sure your editor is set to save all files encoded as UTF-8. Then create a new Rails application and generate a controller called ‘static’ with an ‘index’ action so that we have something to test with.

$ rails uror
$ cd uror/
$ ./script/generate controller static index

Now add the following to app/views/static/index.rhtml (just copy it from this page and paste it into your editor):

<p>Iñtërnâtiônàlizætiøn</p>

Run the Rails application with ./script/server and go to /static/index where you should get something garbled that looks like this:

Iñtërnâtiônà lizætiøn

The problem is that you haven’t told the browser that you’re using UTF-8. Fix this by changing app/controllers/application.rb to:

class ApplicationController < ActionController::Base
  after_filter :set_encoding
  
  protected
  
  def set_encoding
    headers['Content-Type'] ||= 'text/html'
    if headers['Content-Type'].starts_with?('text/') and !headers['Content-Type'].include?('charset=')
      headers['Content-Type'] += '; charset=utf-8'
    end
  end
end

The set_encoding after filter does two things:

  1. It sets the Content-Type header to text/html, but only if no Content-Type header has yet been set. This is exactly what Rails would have done anyway, but we’re doing it here so that…
  2. It adds charset=utf-8 to every Content-Type header for a text type when no charset has yet been set.

If you now reload the page the problem is fixed because the browser is no longer receiving a:

Content-Type: text/html

header, but:

Content-Type: text/html; charset=utf-8

Also set it in your Lighttpd or Apache configuration

It’s a good idea to set the UTF-8 encoding in your web server configuration too. For Apache add the following in public/.htaccess or your main configuration:

AddDefaultCharset utf-8

For Lighttpd, change mimetype.assign in config/lighttpd.conf to:

mimetype.assign = (  
  ".css"        =>  "text/css; charset=utf-8",
  ".gif"        =>  "image/gif",
  ".htm"        =>  "text/html; charset=utf-8",
  ".html"       =>  "text/html; charset=utf-8",
  ".jpeg"       =>  "image/jpeg",
  ".jpg"        =>  "image/jpeg",
  ".js"         =>  "text/javascript; charset=utf-8",
  ".png"        =>  "image/png",
  ".swf"        =>  "application/x-shockwave-flash",
  ".txt"        =>  "text/plain; charset=utf-8"
)

Now all static stuff like 404.html and cached pages are also sent with the correct encoding in the Content-type header.

Even add it to the head

If you want make it easy for people to save your pages to disk and open them with the correct encoding later on, you might want to add the following inside the head element of your html pages:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

Always do this last as it may mask any trouble you might be having with the http headers.

The upcoming 1.2 release of Rails will add utf-8 as the default charset for all renders, so you’ll no longer need the after filter.