An excellent and pragmatic proposal for easier Unicode support in Rails

Thijs van der Vossen

16 Jun 2006

The current Ruby version has no Unicode String class like in Python or Java. This makes it hard for Rails to support multibyte encodings.

The following code snippet from the truncate helper is a good example:

if $KCODE == "NONE"
  text.length > length ? text[0...l] + truncate_string : text
else
  chars = text.split(//)
  chars.length > length ? chars[0...l].join + truncate_string : text
end

This was added to make the helper work with multibyte characters, but it is far from beautiful.

A few days ago, Julian proposed to add a proxy to the string class for accessing characters instead of bytes. I think this is an excellent and very nice solution.

You access the proxy with the char method on a string object. You can for example get the number of characters with:

text.chars.length

The char method is aliased as u, so you can also write:

text.u.length

Which to me looks even nicer.

Using the proxy, you could replace the six lines of code from the truncate helper with:

text.chars.length > length ? text.chars[0...l] + truncate_string : text

That’s a whole lot more obvious. And don’t be fooled, this is just as fast as the longer version because the proxy only uses the multibyte safe methods when $KODE is set.

Apart from making the Rails code easier to understand and maintain, the proxy can also save application developers a lot of work.

The proxy and the patches to the Rails code are currently in development as a plugin you can get from Subversion. Even though the plugin is called ‘Unicode hacks’ for historical reasons, it’s actually a very clean solution by now. There’s also a proposed patch to the Rails source.

Please try this one out and give your feedback.