Ruby and MySQL encoding flakiness
The last few weeks we noticed the dreaded question marks on our sites running against MySQL 5.0. We thought we did everything to make sure our servers, databases, tables, clients and connections understood UTF-8, but somehow connections to the database were reset back to Latin1 after some time.
Instead of trying to fix the problem in Rails/Ruby/libmysql I decided to squash the problem in the MySQL server configuration. By default we were seeing this:
mysql> SHOW VARIABLES LIKE 'character\_set\_%';
+--------------------------+--------+
| Variable_name | Value |
+--------------------------+--------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
+--------------------------+--------+
So I set the following in /etc/mysql/my.cnf:
[mysqld]
character-set-server = utf8
[client]
default-character-set = utf8
Which forces all the encoding to go to UTF-8 by default:
mysql> SHOW VARIABLES LIKE 'character\_set\_%';
+--------------------------+--------+
| Variable_name | Value |
+--------------------------+--------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
+--------------------------+--------+Comments
Add your comment
In order to fight spam on this blog, posting comments from a browser without javascript is currently not supported.
Subscribe
Marston A. 39 minutes later: (delete | show email)
Great tip! I just did this on my system, sure enough I had the same results as you. ¶
Marston A. 43 minutes later: (delete | show email)
Interesting. After added to the config and restarting the server I get the following:
+--------------------------+--------+
| Variable_name | Value |
+--------------------------+--------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | utf8 |
| character_set_system | utf8 |
+--------------------------+--------+
It seems that only character_set_server changed to utf8. Did I miss something? ¶
Manfred Stienstra about 1 hour later: (delete)
Ah right, you also have to set default-character-set for the client. I've update the post. ¶
Marston+A. about 2 hours later: (delete | show email)
Prima.. Almost. At the expense of turning this into some kind of tech support post: 5 down 1 to go! :-P
+--------------------------+--------+
| Variable_name | Value |
+--------------------------+--------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
+--------------------------+--------+
Now I've got everything UTF except character_set_database. I can *feel* it, I'm almost there! :-) ¶
Manfred Stienstra about 2 hours later: (delete)
Right, but the character_set_database depends on the active database. In your case that's probably 'mysql', which is latin1 by default. ¶
Dan Kubb about 6 hours later: (delete | show email)
When I compile MySQL from source I also pass the following to configure so that UTF8 is used by default:
--with-charset=utf8 \
--with-extra-charsets=none \
--with-collation=utf8_unicode_ci ¶
Andreas about 6 hours later: (delete)
Try that in your my.cnf
[mysqld]
default-character-set=utf8
[client]
default-character-set=utf8
Should do it:) ¶