Porting BSD-licensed Text-Processing Tools from OpenBSD

FreeBSD currently uses the GNU versions of the command-line text-processing tools: grep, sort, diff, patch, sdiff. The goal of this project is to port the BSD-licensed versions of these tools from OpenBSD and optimize the performance where possible, provide standard-conformance and handling of wide character sets. The man pages also need to be revised and completed.

TODO

Overall items

grep

Item

Status

Missing --label

COMPLETED

Missing --null

COMPLETED

Missing --color / --colour

COMPLETED

Missing -D / --devices

COMPLETED

Missing -H / --with-filename

COMPLETED

Missing -J / --bz2decompress

COMPLETED

Missing -d / --directories

COMPLETED

Missing -m / --max-count

COMPLETED

Missing -o / --only-matching

COMPLETED

Missing --include suboption for -r

COMPLETED

Missing --exclude suboption for -r

COMPLETED

Missing --help

COMPLETED

Eliminate warnings

COMPLETED

Comment the code

COMPLETED

Check GNU compatibility

COMPLETED

Check POSIX conformance

COMPLETED

Clean up the code

COMPLETED

Merge possible improvements from NetBSD

COMPLETED

Add wide character support

COMPLETED

Add NLS support

COMPLETED

Change binary support to use a buffer and don't pre-extract gzip/bzip2 files

COMPLETED

Add optional -P / --perl-regexp support with PCRE

INCOMPLETE

Regex library notes

During my work on grep I've found a lot of problems with our regex library, thus here is a summay of those.

The functional compatibility of BSD grep seems good, but there are some incompatibilities between the GNU regex library and our libc-regex. The former accepts non-standard regexes, like for example (a|), which contains an empty subexpression. There are a bunch of such cases, thus it is not possible to fix those in the BSD grep code, furthermore a different behaviour in our base regex library and BSD grep should be avoided. It is a question of viewpoint oif we consider these differences "problems" or not, but we definitely want to have compatibility with GNU grep, thus it is a problem for us. Such incompatibilities:

There is a performace issue with our libc-regex, especially with fixed strings. Currently, BSD grep uses an own hack to look for fixed strings, because the regexec() call is approximately two times slower.

As for replacing the regex library. There are BSD-licensed regex libraries, but they don't fit our needs:

It seems that we should fall back to our libc-regex and improve that one. That library is written by Henry Spencer and has a slightly newer version, which might have important fixes and speed ups. BSD grep's fixed string searching might also be a good starting point to optimize libc-regex.

sort

Item

Status

Missing -g / --general-numeric-sort

INCOMPLETE

Missing --help

COMPLETED

Missing -M / --month-sort

INCOMPLETE

Missing -S / --buffer-size

COMPLETED

Missing --version

COMPLETED

Eliminate warnings

INCOMPLETE

Comment the code

INCOMPLETE

Check GNU compatibility

INCOMPLETE

Check POSIX conformance

COMPLETED

diff

Item

Status

Missing --ignore-file-name-case

COMPLETED

Missing --no-ignore-file-name-case

COMPLETED

Missing --strip-trailing-cr

INCOMPLETE

Missing --normal

COMPLETED

Missing --tabsize

INCOMPLETE

Missing --unidirectional-new-file

COMPLETED

Missing --from-file

COMPLETED

Missing --to-file

COMPLETED

Missing --help

COMPLETED

Missing --ignore-blank-lines

INCOMPLETE

Missing --ignore-tab-expansion

INCOMPLETE

Missing -v / --version

COMPLETED

Eliminate warnings

COMPLETED

Comment the code

INCOMPLETE

Check GNU compatibility

INCOMPLETE

Check POSIX conformance

COMPLETED

GáborSoC2008 (last edited 2008-08-10 11:04:52 by GáborKövesdán)