Discussing Unicode for Ruby

Thijs van der Vossen, 21 Jun 2006, 13:15 in ruby on rails, web, and unicode, last updated 16 Sep 2006, 12:18 (edit).

The Unicode roadmap thread on the Ruby Lang mailing list is now almost 100 messages long.

Yukihiro Matsumoto (matz):

I am too optimized for Ruby string operations using Regexp.

Tim Bray:

Julian ‘Julik’ Tarkhanov:

I think this thread is going to end the same as the one in 2002 did.

Read the whole damn thing if you want to know more about the gritty details. In case it makes your head hurt, go read On the Goodness of Unicode, Characters vs. Bytes and The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) first.

Comments

  1. Andi about 3 hours later: (delete)

    This really is a shame.

  2. Manfred Stienstra about 4 hours later: (delete)

    No it's good that there is discussion, the problem with the discussion is that it's not very focused.

  3. Julik about 9 hours later: (delete)

    There might be no succinct discussion as long as you have to please everyone.

  4. Simon de Haan 7 days later: (delete | show email)

    There's been a discussion about proper Unicode support in django on the developers mailling list - the general notion is to have it added to Django before 1.0 - Which is excellent news.

    "I think we should do this.
    We are, after all, perfectionists. "

    -- Adrian Holovaty

    http://groups.google.com/group/django-developers/msg/9a7ba6e280365bd2

  5. Thijs van der Vossen 7 days later: (delete)

    The full thread is interesting:

    http://groups.google.com/group/django-developers/browse_thread/thread/1debd2337965a765/9a7ba6e280365bd2?fwc=2

    Currently both Rails and Django handle Unicode data as UTF-8 encoded data in single-byte strings. This is an amazingly good example of why having both a single-byte String type and a Unicode String type in your programming language is not a good idea.

    I'm getting more and more convinced that the only way to get good Unicode support is to use a language that forces all character data to be stored in proper Unicode Sting types.

    Having said that, it's excellent to see the Django developers acknowledge the need to get it in there before 1.0.

  6. Julik 11 days later: (delete)

    Let's see how many UnicodeDecodeErrors you get when they do :-)

  7. Julik 11 days later: (delete)

    By the way it looks like they don't have correct case-folding and ranges either because they try to mitigate the problem by using UTF8-bytestrings. The fact that Python went for computationally-efficient UTF instead of an interoperable one counts too.

  8. Thijs van der Vossen 11 days later: (delete)

    At least they have normalization:

    http://docs.python.org/lib/module-unicodedata.html

  9. rabindra sarkar 11 days later: (delete)

    how to split a string in the respect of a character

  10. rabindra sarkar 11 days later: (delete)

    how to split in ruby on he rails

  11. Manfred Stienstra 12 days later: (delete)

    Rabindra, install Julik's unicode plugin for Rails if you want a safe split. Read the readme for the details, because you will probably want to install some additional gems.

  12. Julik 19 days later: (delete)

    By "them" I meant Django of course. Python has good Unicode support if you can use it porperly.

Add your comment

In order to fight spam on this blog, posting comments from a browser without javascript is currently not supported.