How to read charts
The charts list characters in assending order according
to their collation (or locale) rules.
The color and the border of the cell indicates the strength of the difference
between that character and the previous character in the chart, as follows.
|Senary difference, or equal
Each cell consists of:
- a character (of multiple characters in case of contractions)
- its code according to the character set of the chart,
- as well as its Unicode code point.
Cells can represent character-to-weight mappings of the following types:
- Ignorable - a character doesn't have weights on the primary level
(but may have weights on the secondary or the tertiary, or the higher levels).
Often, punctuation characters are ignorable.
For example, Windows locales
and Fedora Core locales
ignore on the first level the "U+002D HYPHEN-MINUS" (and some other punctuation characters),
which helps to sort in a "culturally correct" way, when "co-operation"
is sorted near "cooperation": both after "convexity" and before
"copper". I.e. the hyphen character is ignored on the first pass of
comparison, it is checked later, when a sorting program needs to detect
mutual order of the words "cooperation" and "co-operation",
and the former goes first.
- Simple - a character has a single weight on the primary level.
Most letter and digit characters have simple weight mappings.
- Expansion - a character has multiple weights on the primary level.
For example in German, the letter "ß" (U+00DF SHARP S) produces two
weights and is sorted near "ss" (LATIN SMALL LETTER S followed by another
LATIN SMALL LETTER S).
- Contraction - when multiple characters act as a single letter.
For example in Czech,
the combination of two characters "CH" is considered as a single letter,
which is sorted after "H" and before "I".
Firebird calls this phenomena as "compression". We'll use
the word "contraction" which is a more traditional term.
- Contraction with expansion - when a special sequence
of multiple characters produce multiple weights. Examples
of such mapping between characters and weights can be found
The combination of two characters "CS" is a single letter,
which is sorted after "C" and before "D", and which gives a single
weight in the primary level. Thus "CS" itself
is an example of a contraction, described in the previous paragraph.
However, when letter "CS" is followed by another letter "CS" -
they can be written together using a short form, as "CCS",
which is collated as CS+CS (rather than C+CS). These short
Hungarian forms are called as simplified geminate of multigraphs,
and they are examples of contractions with expansion.
Thus, "CCS" is a sequence of three characters which give
two weights on the primary level.