A very nice Microformats bookmarklet

Thijs van der Vossen, 27 Sep 2006, 20:17 in web and design (edit).

Remy Sharp has created a very nice Microformats bookmarklet based on Jon Hicks’ proposal for a Safari Microformats plugin.

Fingertips contact information in the Microformats bookmarklet

Via Tobias Lütke, who gives a good example of what you should be able to do with Microformats:

[…] In other words you will finally be able to drag drop a web page with contact information onto your Address software and it should “just work”.

No comments yet

The Proxy Pattern in Ruby

Manfred Stienstra, 21 Sep 2006, 13:55 in ruby on rails (edit).

My new favorite idiom to use in Ruby is the proxy pattern. But I’m not using it to reduce the memory footprint or to manage some complex resource. I’m using it to keep the original class clean of my torrent of methods and to make my API easy to use.

One of the biggest problems with adding methods to a core class is the chance of collision with other libraries or code, in order to keep this chance low you don’t want to flood classes with new methods. Imagine we want to perform some textual operations on a String, for example ‘Textilize’ it.

require 'redcloth'
RedCloth.new('h1. Proxies!').to_html

But that doesn’t look very nice, especially if we have to duplicate it. Duplication will get us in trouble when we want to change the parameters we send to RedCloth or when we want to switch to another textile processor.

class Formatters
  def self.textilize(str)
    RedCloth.new(str).to_html
  end
end

Formatters.textilize 'h1. Proxies!'

Even though we packaged the code nicely in a method, I’m still not satisfied. This will require us to type that long classname before every call to textilize. And what if we want to perform a number of operations on the same string? Brackets will have to come into play and Lisp our code.

module Formatters
  def textilized
    RedCloth.new(self).to_html
  end
  
  def unnewlined
    self.gsub!(/\r\n/, "\n")
  end
end

String.send :include, Formatters

"h1. Proxies!\r\n".unnewlined.textilized #=> "<h1>Proxies!</h1>"

Very nice! But there are still some problems with this solution. The formatting methods might accidentally override other methods on String, especially if we define even more of these formatting methods. Even though the methods work on the string, they don’t really have much to do with the String class.

This is where the proxy class comes into the picture, it will only take one method on the String class and work as a portal to all our formatting code. In the process the proxy class is even going to make our solution pluggable.

require 'rubygems'
require 'redcloth'

module Formatting
  class Formatter
    
    def initialize(str)
      @str = str
    end
    
    def to_str
      @str
    end
    alias_method :to_s, :to_str
  end
  
  module StringExtension
    def format
      Formatting::Formatter.new self
    end
  end
  
  def register(mod)
    Formatting::Formatter.send :include, mod
  end
  module_function :register
end

module Formatting::Textilize
  def textilized
    @str = RedCloth.new(@str).to_html
    self
  end
end
Formatting.register Formatting::Textilize

module Formatting::UnNewline
  def unnewlined
    @str.gsub!(/\r\n/, "\n")
    self
  end
end
Formatting.register Formatting::UnNewline

String.send :include, Formatting::StringExtension

See how the well placed self enables us to to chain methods?

"h1. Proxies!\r\n".format.unnewlined #=> #<Formatting::Formatter:0x1087c4c @str="h1. Proxies!\n">
"h1. Proxies!\r\n".format.unnewlined.textilized.to_s #=> "<h1>Proxies!</h1>"

There we go, a nice proxy implementation with all the benefits of the original solution. Unfortunately there are some drawbacks, instead of a String instance the methods return a Formatter instance. The to_str keeps us safe in most cases, like concatenation, but not in all cases. Further cloaking measures are left as an exercise for the reader (hint: method_missing, the boat and Comparable).

If you want to see a really cool proxy implementation, check out ActiveSupport::Multibyte.

No comments yet

Second ‘morning coffee’ meeting in Amsterdam

Thijs van der Vossen, 13 Sep 2006, 12:34 in ruby on rails and meetings (edit).

Since we really enjoyed the first ‘morning coffee’ meeting, it is time for another one. The goal remains unchanged: a good chat about our experiences over a strong cup of coffee.

We have decided, however, to broaden the scope beyong Rails developers. Anyone working with next-generation web application frameworks, like Django or even Seaside, is invited too.

When: Thursday, October 5th, 2006, 9:30 AM

Where: The Coffee Company on the corner of the Nieuwe Doelenstraat and the Kloveniersburgwal in Amsterdam

Please leave a comment to tell us you’ll be there or if you have any questions.

22 comments

Extensive testing

Manfred Stienstra, 12 Sep 2006, 16:04 in ruby on rails and unicode (edit).

Today I’ve been merging the Rails multibyte support from Julik’s unicode hacks plugin into the current edge source. After a few testruns my Mac Mini started complaining about the normalization conformance tests…

Finished in 102.769516 seconds.

460 tests, 353652 assertions, 0 failures, 0 errors

I guess I’ll have to fix the Rakefile so I don’t overheat everyone’s computer.

5 comments

URoR 2: KCODE

Thijs van der Vossen, 12 Sep 2006, 00:33 in ruby on rails and unicode (edit).

URoR stands for ‘Unicode Ruby on Rails’ which is a series on using Unicode with Rails. In this second article I’ll show you how to enable the (somewhat limited) UTF-8 support in Ruby and Rails. (first article)

Let’s break a string

Suppose you’re using the truncate helper like this:

<p><%= truncate 'Iñtërnâtiônàlizætiøn', 12 %></p>

The result is something like:

Iñtërn?

Because the helper truncates the string to 12 bytes, it slices the codepoint for the ‘â’ in halve. The result is an invalid sequence which can’t be rendered.

Fix this by adding the following to the top of config/environment.rb:

# Add basic utf-8 encoding support 
$KCODE = 'UTF8'

And the result will be:

Iñtërnâti…

The string is now truncated to 12 codepoints instead of 12 bytes.

What’s happening here?

Setting KCODE to 'UTF8' tell Ruby that your source code is encoded as UTF-8. Some libraries like CGI and some parts of Rails look at KCODE to find out if they need to process strings in a UTF-8 friendly way.

You can now also require the jcode library you get some basic UTF-8 encoding support in Ruby. More about this in a future article.

Not all is good

Although it’s great that truncate has been fixed to work with UTF-8 you should be aware that this is not the case for all helpers:

<p><%= excerpt 'Iñtërnâtiônàlizætiøn', 'nâtiôn', 2 %></p>

This currently always breaks no matter if KCODE is set or not:

?rnâtiônàl…

You can again get the code from our subversion repository.

No comments yet

Submit your Rails Documentation Project proposals

Thijs van der Vossen, 07 Sep 2006, 11:44 in ruby on rails (edit).

Looks like the caboose folks are ready for Rails Documentation Project proposals. If you want to work on the Rails documentation then please tell a little bit about yourself, why you’re qualified, and what you’d like to work on.

No comments yet

Unicode is part of the solution, not part of the problem

Thijs van der Vossen, 06 Sep 2006, 08:59 in ruby on rails and unicode (edit).

Tim Bray in the (for now) final Ruby Ape Diaries entry (emphasis added):

It’s easy to make people angry about this subject, and some of the angry people have a point; certain aspects of Unicode are, on the surface at least, objectively racist; for example, why does UTF-8 encoding of characters become progressively less efficient as you move from the languages of the Western hemisphere to those of the East?

Having said all that, it is my opinion that Unicode works pretty well, and in terms of making the Internet useful to the many peoples of Earth, is part of the solution, not part of the problem. And for that reason, I think that any language that doesn’t do a real good job at Unicode isn’t a very good citizen. And I think Ruby has a major problem in this area. Solutions are promised; we’ll see. And hey, in a few weeks I’m going to get up a stage in a room in Denver full of Rubyists and talk about this stuff; we’ll see whether they let me out of town alive.

Someone please, please record his talk; I’m really looking forward to what he has to say on the subject.

No comments yet

URoR 1: Set the Content-Type

Thijs van der Vossen, 05 Sep 2006, 00:49 in ruby on rails and unicode (edit).

URoR stands for ‘Unicode Ruby on Rails’ which is going to be a series on using Unicode with Rails. In this first article I’ll show you how to set the Content-Type header so that the browser knows what you’re sending. (second article)

Set it in an after filter

On the web, the One And Only Sensible Encoding for Unicode is UTF-8, so that’s what we’re going to use. First, make sure your editor is set to save all files encoded as UTF-8. Then create a new Rails application and generate a controller called ‘static’ with an ‘index’ action so that we have something to test with.

$ rails uror
$ cd uror/
$ ./script/generate controller static index

Now add the following to app/views/static/index.rhtml (just copy it from this page and paste it into your editor):

<p>Iñtërnâtiônàlizætiøn</p>

Run the Rails application with ./script/server and go to /static/index where you should get something garbled that looks like this:

Iñtërnâtiônà lizætiøn

The problem is that you haven’t told the browser that you’re using UTF-8. Fix this by changing app/controllers/application.rb to:

class ApplicationController < ActionController::Base
  after_filter :set_encoding
  
  protected
  
  def set_encoding
    headers['Content-Type'] ||= 'text/html'
    if headers['Content-Type'].starts_with?('text/') and !headers['Content-Type'].include?('charset=')
      headers['Content-Type'] += '; charset=utf-8'
    end
  end
end

The set_encoding after filter does two things:

  1. It sets the Content-Type header to text/html, but only if no Content-Type header has yet been set. This is exactly what Rails would have done anyway, but we’re doing it here so that…
  2. It adds charset=utf-8 to every Content-Type header for a text type when no charset has yet been set.

If you now reload the page the problem is fixed because the browser is no longer receiving a:

Content-Type: text/html

header, but:

Content-Type: text/html; charset=utf-8

Also set it in your Lighttpd or Apache configuration

It’s a good idea to set the UTF-8 encoding in your web server configuration too. For Apache add the following in public/.htaccess or your main configuration:

AddDefaultCharset utf-8

For Lighttpd, change mimetype.assign in config/lighttpd.conf to:

mimetype.assign = (  
  ".css"        =>  "text/css; charset=utf-8",
  ".gif"        =>  "image/gif",
  ".htm"        =>  "text/html; charset=utf-8",
  ".html"       =>  "text/html; charset=utf-8",
  ".jpeg"       =>  "image/jpeg",
  ".jpg"        =>  "image/jpeg",
  ".js"         =>  "text/javascript; charset=utf-8",
  ".png"        =>  "image/png",
  ".swf"        =>  "application/x-shockwave-flash",
  ".txt"        =>  "text/plain; charset=utf-8"
)

Now all static stuff like 404.html and cached pages are also sent with the correct encoding in the Content-type header.

Even add it to the head

If you want make it easy for people to save your pages to disk and open them with the correct encoding later on, you might want to add the following inside the head element of your html pages:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

Always do this last as it may mask any trouble you might be having with the http headers.

Update: You can now get all URoR code from our subversion repository.

Update 2:: The upcoming 1.2 release of Rails will add utf-8 as the default charset for all renders, so you’ll no longer need the after filter.

2 comments

HTTP Digest Authentication

Manfred Stienstra, 04 Sep 2006, 23:13 in web and broken (edit).

For traditional sites cookie based authentication was often the best choice, especially because the application has complete control of the session which allows for automated logouts and other freaky stuff. Over the last year I’ve implemented quite a few authenticated applications and a large number of them has feeds or a webservice interface of some sort. But feedreaders and REST clients don’t really like cookie based authentication. HTTP Authentication is an obvious alternative, so we started using it.

In a lot of todo lists under the header ‘in the distant future’ there was an item: implement digest authentication. So I decided to bite the bullet and read RFC 2617.

I implemented the protocol for both sides, client and server. I sincerely believe that’s the best way to implement a protocol. That way you can always test the partially implemented client on the partially implemented server and bootstrap until everything is done. The best thing is that client and server implementations share a lot of algorithms, working on both makes your implementation orthogonal by default.

Implementing the specs went well, until I tried to talk to other implementations. Already in the first week I discovered four problems:

  1. Apache doesn’t send the required ‘nextnonce’ directive in it’s Authentication-Info header.
  2. Safari quotes algorithm and qop directives in the Authorization header. These directives shouldn’t be quoted.
  3. IE quotes algorithm and qop directives just like Safari does.
  4. IE computes the digest only over the path part of the URI instead of over the path and query part. (From the apache documentation of mod_auth_digest)

This brings up quite a few questions. Did Safari copy the quoting behaviour from IE instead of reading the RFC themselves? Is implementing standards too hard? Should standards be replaced by reference implementations?

I willing to tackle the last question because it’s so easy to answer. RFC 2617 happens to provide a reference implementation for computing Authorize headers, so that can’t be the problem. So are standards just too hard? RFC 2617 is a pretty complicated pieces of header prose, but it’s not as long and threaded as the HTTP specs. And way way easier than SOAP specs. So there must be something else.

Let’s assume for a moment that standards are completely unambiguous and well written. Given that premise, I believe that the quality of the implementation is a direct result of the determination and vigilance of the programmer. Or better yet, group of programmers. Two pairs of eyes see more that one, and a whole open source community sees more that just one annoyed corporate programmer.

I think digest authentication implementations haven’t received the level of scrutiny that other protocols have and that this resulted in a number of bugs in the various implementations. On that note I would like you to check out my own implementation: HTTP Authentication for Ruby. You can find the API documentation on Rubyforge. There is also a gem, which you can install the usual way:

gem install httpauth

This is still early beta and there are bugs and limitations.

4 comments

Joel Spolsky got one thing right

Thijs van der Vossen, 01 Sep 2006, 09:11 in ruby on rails and unicode (edit).

Although I fully agree with David that Joel Spolsky’s Language Wars is one of the purest forms of FUD against Ruby and Rails ever, I do think Joel got this one right:

I for one am scared of Ruby because (1) it displays a stunning antipathy towards Unicode […]

Sad as it may be, this fear is mostly justified.

5 comments