As of version 5.1, MySQL supports two Croatian collations: cp1250_croatian_ci and latin2_croatian_ci.
These two collations correctly handle letters Č, Ć, Đ, Š, Ž as separate letters, however they luck support for digraphs, therefore the letters Dž, Lj, Nj are sorted as two separate letters. For example, Lj comes after Li and before Lk, similar to English.
The "real" Croatian sorting rules consider digraphs Dž, Lj, Nj as single letters. That means:
Unicode calls this phenomena (when two letters are sorted as a single letter) as contraction.
MySQL does support contractions for some languages. For example, in Slovak, the letter Ch is a single letter between H and I. So there should not be any problems to add a Unicode Croatian collations which would support digraphs Lj and Nj.
The problem, though, is that contractions in MySQL are limited only to ASCII letters. Contractions consisting of non-ASCII letters are not supported. That means, it's not possible to handle Dž correctly.
Support for contraction having non-ASCII parts will be added in MySQL under terms of WL#2673 Unicode Collation Algorithm new version. The good news is that a patch for MySQL-5.6 is already available and has passed code review.
But the bad news are:
To help Croatians start using MySQL more actively earlier, I decided to pull out the contraction related part from the WL#2673 patch and add extra pieces of code which actually add two Croatian collations: utf8_croatian_ci and ucs2_croatian_ci, and combined these changes into a single patch implementing full Croatian ordering.
I'll appreciate any feedback from the Croatian MySQL community. Please write your comments to <Alexander.Barkov[at]Sun.COM>. Thanks!
DŽ Dž dž
LJ Lj lj
NJ Nj nj