Unichars 0.2 released
Unichars is a simple wrapper around Glib2 Unicode functions. You can use it to speed up certain methods on Unicode string. Currently supported are: upcase, downcase, reverse, and size. The cool thing about it is that it works seamlessly with ActiveSupport::Multibyte and it works great without ActiveSupport::Multibyte.
I guess Unichars is not a very exiting name like God, Vlad the Deployer, or Gerard Joling but I guess I’m not that kind of a guy.
You can install Unichars with Rubygems:
$ gem install unichars
Or you can fetch it from Github:
$ git clone git://github.com/Manfred/unichars.git
$ cd unichars
$ rake gem:install
With Rails
The examples in the README tell you how to use Unichars with Rails 2.1 or newer. I’ll just re-iterate how it’s done.
First you make sure you load the library, the easiest way to do this is with config.gem in environment.rb:
config.gem 'unichars'
Or when you dislike gems, you can just require it:
require 'unichars'
When you’re not using config.gem, you have to make sure ActiveSupport is loaded before Unichars, otherwise the Rails integration won’t magically work.
After that you have to tell ActiveSupport::Multibyte to use the Unichars class as proxy class. You can do that in an initializer or at the end of your environment.rb. I would recommend doing it in config/initializers/unichars.rb.
ActiveSupport::Multibyte.proxy_class = Unichars
Now all of Rails will automatically use the Unichars character proxy, you can also use it yourself:
'青山'.mb_chars.reverse #=> '山青'
Without Rails, but with ActiveSupport
require 'activesupport'
require 'unichars'
ActiveSupport::Multibyte.proxy_class = Unichars
$KCODE = 'u'
Other than that it’s similar to Rails:
'Sluß'.mb_chars.upcase #=> 'SLUSS'
A good time to talk about LC_CTYPE real quick. Note that Glib2 picks that up from your environment, so your results may vary depending on what it’s set too.
Without training wheels
require 'unichars'
If you don’t use ActiveSupport, you can still use Unichars because it comes with a light version of the Chars proxy. You will have to wire it yourself though:
class String
def mb_chars
Unichars.new(self)
end
end
'Copy-®'.mb_chars.size #=> 6
Without anything
Finally, you can just use the Glib2 wrapper and roll your own solution:
require 'glib'
Glib.utf8_upcase('Comme des Garçons').upcase #=> 'COMME DES GARÇONS'
Questions?
If you have any questions or issues, please use the Github Wiki Wiki as much as possible. If you want to discuss anything you can find me on Freenode in #rails-contrib. Have fun with Unichars!
Comments
Add your comment
In order to fight spam on this blog, posting comments from a browser without javascript is currently not supported.
Subscribe
Ferdinand Svehla about 12 hours later: (delete | show email)
Looks great so far.
Any plans on a JRuby-compatible version? ¶
Manfred Stienstra about 12 hours later: (delete)
There are no plans to develop a JRuby version because we never use it. The API of Unichars is really simple so it should be trivial to wrap a Java String to do the same.
I wouldn't mind accepting patches though. ¶
Rob 400 days later: (delete | show email)
Manfred, i can't get it installed on a SL-Mac to use it with Macruby. I really would like to have an alternative for mb_chars it coz Macruby doesn't support ActiveSupport (yet) ... well, keep up the good work :) ¶
Manfred Stienstra 406 days later: (delete)
MacRuby doesn't really need Unichars because it has NSString, which has excellent Unicode support. Also, I don't think extensions work on MacRuby just yet.
I guess Unichars could be used as a compatibility layer between different implementations. Like I mentioned earlier, I haven't really seen the need to implement it because I don't use it. It would be great if someone who actually needs it could implement and maintain it. I could certainly help. ¶