Croatian Unicode collations for MySQL-5.1

As of version 5.1, MySQL supports two Croatian collations: cp1250_croatian_ci and latin2_croatian_ci.

These two collations correctly handle letters Č, Ć, Đ, Š, Ž as separate letters, however they luck support for digraphs, therefore the letters , Lj, Nj are sorted as two separate letters. For example, Lj comes after Li and before Lk, similar to English.

The "real" Croatian sorting rules consider digraphs , Lj, Nj as single letters. That means:

Unicode calls this phenomena (when two letters are sorted as a single letter) as contraction.

MySQL does support contractions for some languages. For example, in Slovak, the letter Ch is a single letter between H and I. So there should not be any problems to add a Unicode Croatian collations which would support digraphs Lj and Nj.

The problem, though, is that contractions in MySQL are limited only to ASCII letters. Contractions consisting of non-ASCII letters are not supported. That means, it's not possible to handle correctly.

Support for contraction having non-ASCII parts will be added in MySQL under terms of WL#2673 Unicode Collation Algorithm new version. The good news is that a patch for MySQL-5.6 is already available and has passed code review.

But the bad news are:

All this means Croatians will have to use the other pieces of DBMS software for some more time.

To help Croatians start using MySQL more actively earlier, I decided to pull out the contraction related part from the WL#2673 patch and add extra pieces of code which actually add two Croatian collations: utf8_croatian_ci and ucs2_croatian_ci, and combined these changes into a single patch implementing full Croatian ordering.

I'll appreciate any feedback from the Croatian MySQL community. Please write your comments to <Alexander.Barkov[at]Sun.COM>. Thanks!

A       a
B       b
C       c
Č       č
Ć       ć
D       d
DŽ Dž   dž
Đ       đ
E       e
F       f
G       g
H       h
I       i
J       j
K       k
L       l
LJ Lj   lj
M       m
N       n
NJ Nj   nj
O       o
P       p
R       r
S       s
Š       š
T       t
U       u
V       v
Z       z
Ž       ž