Multibyte collation support in FreeBSD


This project aims to bring to FreeBSD the correct sorting of national characters encoded in UTF8. Currently, strings with those characters in their beginning always ended at the and of the resulting sorted list - which was obviously wrong. Some workarounds were tried and worked - for example using the ICU library for Postgres 8.3. Although appealing, such approaches don't give us correct support in the base system - for example in sort(1) and ls(1). They also mean we (or authors) have to manually patch every program affected. It is obvious that a lower level solution is needed. This project is the solution.

Situation before the project started

What needs to be done

Current project status:

converter scripts for CLDR data


my version of the program generating the LC_COLLATE table (colldef)


porting Apple's colldef program


porting collation support from Apple's libc


writing regression tests


add support for expansions needed for some languages

in progress

documenting everything

to be done

Implementation rationale

When porting parts of Apple's libc, I faced a choice of importing xlocale (, or throwing it out. The xlocale changes were very widespread throughout the libc, and I felt importing it is beyond the scope of this project. Just the number of affected files made me feel uneasy:

19:15|versus@vspredator:libc% grep -l -R locale_t * | grep -v FreeBSD | wc -l

Affected functions include vfprintf itself, which now has additional locale_t argument. The diff for vfprintf.c is 900 lines long. Also, personally, I don't like adding things for which I don't see immediate gains in functionality - and I never saw a program which (used/could benefit from) two different locales in two different threads at the same time.

Also see

KonradJankowski/Collation (last edited 2017-09-18T13:06:47+0000 by KubilayKocak)