UnSpun encoding problems

Manfred Stienstra

A few weeks ago Amazon launched UnSpun, a web application to collectively manage lists of all sorts.

During signup I was presented with the following.

Screenshot of UnSpun with a broken letter

I know Internet Explorer fixes a lot of broken encoding by guessing the true encoding for just about everything, maybe that’s why they never noticed during development?

I’ve had this problem myself on a few occasions. Because geographical information is commonly extracted from text files and loaded into a database you always have to be really careful to transcode any data extracted from text files to the same encoding as the database. In the case of ISO-8859-1/15, which is commonly used in west-european countries, there is a really simple oneliner to transcode to utf-8.

source.unpack('C*').pack('U*')