BSD grep

Status

BSD grep has been imported to 9.0-CURRENT on July 22, 2010. It has been obtained from the OpenBSD and the FreeGrep projects to make it appropriate for our needs to provide GNU compatibility and let ports build without problems. GNU compatibility is almost 100%, the only missing feature is -P (--perl-regex), which is not possible to support without having PCRE in base system and anyway, it was disabled in GNU grep for the same reason. It is still linked to GNU libregex because we needed compatibility for non-standard regexes, e.g. [foo|] is accepted by GNU, while it is considered an empty subexpression in stricter implementations. Ports may use it and portmaster* is also known to use it.

* As of Portmaster 3.17.10, I cannot actually locate any use of empty subexpressions or other GNU extensions in portmaster. I suspect these have since gone away, and ports are a WIP (see open PRs)

TODO

Performance

One of the TODO item is improving performance. BSD grep is somewhat behind GNU grep in this term. The reason is that GNU grep uses lots of hacks to improve performance by optimizing regexes. First, we need a modern and efficient regex library, like TRE*, which is BSD-licensed and wchar-compliant and then we should reevalute the performance and looking for bottlenecks. For the meantime, WITH_GNU_GREP can be used to build GNU grep instead of BSD grep for those people, who need better performance. GNU grep is also available in the Ports Collection as textproc/gnugrep.

* This claim/plan needs re-evaluated; as of the last benchmarks I found, TRE was no more efficient than the current libc/regex(3) ("Spencer") implementation. I do not know the right path forward, but as of right now the main bottleneck is not in the regex(3) implementation but in grep itself. This should probably be de-prioritized/deferred until it can be proven that regex(3) is a problem. My focus will not be on replacing it going forward -- it is maintainable and able to be extended.

Chunking

A potential plan for doing chunk processing instead of full line processing may be found at BSDgrep/Chunking.

Open Reviews / PRs

Pending reviews:

Postponed reviews:

* The Capsicum patch has been deferred until other things settle down, bugs having been a higher priority. Additionally, I have been in touch with oshogbo@, who indicated that a Casper file service was in progress and close to completion/release. The Capsicum patch will get revised to use the file service if it makes it in before everything else settles down.

Completed reviews (for easy reference while work is in progress):

PRs to fix issues from exp-run (PR 218385):

Worth noting is that some chunk of these PRs is (was) to stop relying on GNU extensions in base, changing them to use textproc/gnugrep instead of base grep. Once BSDgrep becomes more capable, these and others currently relying on textproc/gnugrep will be re-evaluated for losing this dependency and instead using grep in base. There is a working plan for GNU extension re-implementation.

How to build a GNU-less BSD grep for testing

 # svn up /usr/src  #  or any other suitable means to get your source tree updated
 # cd /usr/src/usr.bin/grep
 # make clean
 # make cleandir
 # make cleandir
 # make WITH_BSD_GREP=yes WITHOUT_GNU_COMPAT=yes obj depend all install

To revert this, re-do the steps above, omitting the WITH_BSD_GREP=yes WITHOUT_GNU_COMPAT=yes part.

BSDgrep (last edited 2017-05-26 14:13:14 by KyleEvans)