Porting BSD-licensed Text-Processing Tools from OpenBSD
FreeBSD currently uses the GNU versions of the command-line text-processing tools: grep, sort, diff, patch, sdiff. The goal of this project is to port the BSD-licensed versions of these tools from OpenBSD and optimize the performance where possible, provide standard-conformance and handling of wide character sets. The man pages also need to be revised and completed.
TODO
Overall items
- Create regression tests.
- Find a method to collect which options are used by the Ports Collection so that we can measure the progress of the compatibility.
- Compare performance to GNU tools and optimize implementation to bring performance to par
- See how well (on features and performance) GNU tools deal with wide character set input
grep
Item |
Status |
Missing --label |
COMPLETED |
Missing --null |
COMPLETED |
Missing --color / --colour |
COMPLETED |
Missing -D / --devices |
COMPLETED |
Missing -H / --with-filename |
COMPLETED |
Missing -J / --bz2decompress |
COMPLETED |
Missing -d / --directories |
COMPLETED |
Missing -m / --max-count |
COMPLETED |
Missing -o / --only-matching |
COMPLETED |
Missing --include suboption for -r |
COMPLETED |
Missing --exclude suboption for -r |
COMPLETED |
Missing --help |
COMPLETED |
Eliminate warnings |
COMPLETED |
Comment the code |
COMPLETED |
Check GNU compatibility |
COMPLETED |
Check POSIX conformance |
COMPLETED |
Clean up the code |
COMPLETED |
Merge possible improvements from NetBSD |
COMPLETED |
Add wide character support |
COMPLETED |
Add NLS support |
COMPLETED |
Change binary support to use a buffer and don't pre-extract gzip/bzip2 files |
COMPLETED |
Add optional -P / --perl-regexp support with PCRE |
INCOMPLETE |
Regex library notes
During my work on grep I've found a lot of problems with our regex library, thus here is a summay of those.
The functional compatibility of BSD grep seems good, but there are some incompatibilities between the GNU regex library and our libc-regex. The former accepts non-standard regexes, like for example (a|), which contains an empty subexpression. There are a bunch of such cases, thus it is not possible to fix those in the BSD grep code, furthermore a different behaviour in our base regex library and BSD grep should be avoided. It is a question of viewpoint oif we consider these differences "problems" or not, but we definitely want to have compatibility with GNU grep, thus it is a problem for us. Such incompatibilities:
- (a|) has an empty subexpression in extended regexp
(a||b) has an empty subexpression in extended regexp
- ?*, *?, **, ??, (* ^* have invalid repetition operators in extended regexp
There is a performace issue with our libc-regex, especially with fixed strings. Currently, BSD grep uses an own hack to look for fixed strings, because the regexec() call is approximately two times slower.
As for replacing the regex library. There are BSD-licensed regex libraries, but they don't fit our needs:
- PCRE: it has a POSIX API, but it still uses the Perl-syntax with that, thus it is incompatible. It could be imported though to have some extra features in the base system.
- Oniguruma: It has POSIX support and seems to work well, but it seems to be even slower than our libc-regex.
- Lrexlib: It is just a Lua-binding, not suitable.
It seems that we should fall back to our libc-regex and improve that one. That library is written by Henry Spencer and has a slightly newer version, which might have important fixes and speed ups. BSD grep's fixed string searching might also be a good starting point to optimize libc-regex.
sort
Item |
Status |
Missing -g / --general-numeric-sort |
INCOMPLETE |
Missing --help |
COMPLETED |
Missing -M / --month-sort |
INCOMPLETE |
Missing -S / --buffer-size |
COMPLETED |
Missing --version |
COMPLETED |
Eliminate warnings |
INCOMPLETE |
Comment the code |
INCOMPLETE |
Check GNU compatibility |
INCOMPLETE |
Check POSIX conformance |
COMPLETED |
diff
Item |
Status |
Missing --ignore-file-name-case |
COMPLETED |
Missing --no-ignore-file-name-case |
COMPLETED |
Missing --strip-trailing-cr |
INCOMPLETE |
Missing --normal |
COMPLETED |
Missing --tabsize |
INCOMPLETE |
Missing --unidirectional-new-file |
COMPLETED |
Missing --from-file |
COMPLETED |
Missing --to-file |
COMPLETED |
Missing --help |
COMPLETED |
Missing --ignore-blank-lines |
INCOMPLETE |
Missing --ignore-tab-expansion |
INCOMPLETE |
Missing -v / --version |
COMPLETED |
Eliminate warnings |
COMPLETED |
Comment the code |
INCOMPLETE |
Check GNU compatibility |
INCOMPLETE |
Check POSIX conformance |
COMPLETED |