Improve BSD-Licensed Text Processing Tools

Student

Jesse Hagewood

Mentor

Dag-Erling Smørgrav

Project description

My proposal for the FreeBSD project is to optimize and/or complete the BSD-licensed text processing tools diff, diff3, sdiff, and mdocml. Much of what I am proposing is to complete the work done by Ben Fiedler in the 2010 GSoC.

http://wiki.freebsd.org/SOC2010BenFiedler

Approach to solving the problem

When completing the diff family, I will consider (1) compatibility with POSIX, (2) with the GNU variant that is used in FreeBSD at the moment and (3) performance.

Deliverables

Milestones

May 21 - June 17

June 18 - July 1

July 2 - July 18

July 19 - August 5

August 6 – August 12

August 13 - August 20

Test Plan

To test functionality, I will make use of automated test scripts to test all of the functions I am implementing. I will also test functions that are already implemented to be completely sure my changes to the software have not effected the other functions of the particular utility. For every reported problem, I will add a new test case for regression testing later.

For testing compatibility with POSIX standards and the GNU utilities currently in FreeBSD, I will compare the test script outputs of both the BSD-licensed tools and the GNU tools to make sure they produce identically formatted outputs.

I will also benchmark the BSD-licensed tools against the GNU tools for testing performance.

The Code

https://socsvn.freebsd.org/socsvn/soc2012/jhagewood/

Status Report (continued from SoC 2010)

diff

Item

Status

Notes

Missing --speed-large-files

INCOMPLETE

Argument is accepted, but makes no functional change

Missing --ignore-file-name-case

COMPLETED

Missing --no-ignore-file-name-case

COMPLETED

Missing --strip-trailing-cr

COMPLETED

Missing --normal

COMPLETED

Missing --tabsize

COMPLETE

Missing --unidirectional-new-file

COMPLETED

Missing --from-file

COMPLETED

Missing --to-file

COMPLETED

Missing --help

COMPLETED

Missing --ignore-blank-lines

INCOMPLETE

Missing --ignore-tab-expansion

IN PROGRESS

Missing -v / --version

COMPLETED

Eliminate warnings

COMPLETED

Comment the code

INCOMPLETE

Check GNU compatibility

COMPLETED

All implemented features GNU compatible as of 6/17/2012

Check POSIX conformance

COMPLETED

Missing --line-format options

IN PROGRESS

regex support is available

Missing --group-format options

COMPLETE

Missing --group-format

INCOMPLETE

Adapt source to FreeBSD style guidelines

COMPLETE

Tighter integration between diff utilities

IN PROGRESS

zdiff integration in diff.

sdiff

Item

Status

Notes

Fix -c99 build warnings/errors

COMPLETE

Combine diff-specific args and pipe to diff process

COMPLETE

Fix output indention

COMPLETE

Binary file support

COMPLETE

Adapt source to FreeBSD style guidelines

COMPLETE

.gz file support

COMPLETE

zsdiff

diff3

Item

Status

Notes

Replaced ksh script with sh

COMPLETED

-i flag

COMPLETED

-T flag

COMPLETED

-a flag

COMPLETED

--show-all option

INCOMPLETE

--easy-only

INCOMPLETE

--merge

INCOMPLETE

--label

INCOMPLETE

--strip-trailing-cr

COMPLETED

--diff-program

INCOMPLETE

--version

COMPLETED

--help

COMPLETED

Adapt source to FreeBSD style guidelines

COMPLETED

mdocml

June 15 UPDATE - For my first two weeks working on mdocml, I tried implementing these features as roff requests, but that was more difficult than I anticipated, and I used the next two weeks studying man/mdoc and began implementing .ns/.rs (no-space mode) and .ti (temporary indent) as man/mdoc macros. As of June 15th, I have not fully completed implementing these macros, but mdocml will still be a secondary focus during this project and I will be trying to finish these macros. Also, I was able to get a list of all man pages that will not compile under mandoc. The list is in my SVN repository.

Item

Status

Notes

.ns (no-space mode)

IN PROGRESS

.rs (no-space mode off)

IN PROGRESS

.ti (temporary indent)

IN PROGRESS

.ta (tab settings)

INCOMPLETE

Implementing mdocml macros

man source files

man.h

Defines man's structs and enums, the relevant ones being the index for macros, enum mant.

man_macro.c

Begin parsing man macros, seperately for implicit-block macros or explicit-block macros, and creates a man node for each macros.

man_validate.c

Post-processing functions for several man macros.

man_html.c

Pre-processing functions for many man macros.

man_term.c

Pre-processing functions for some macros, and post-processing functions for macros that had pre-processing in man_html.

mdoc source files

mdoc.h

mdoc_macro.c

mdoc_validate.c

mdoc_html.c

mdoc_term.c

The structure of mdoc is the same as man.

Other files

term.c

html.c

Functions for formatted output, such as inserting new lines, horizontal/vertical spacing, functions for fonts, etc.

Benchmarks

diff

GNU diff

Minor pagefaults - 101

BSD diff

Minor pagefaults - 93

sdiff

GNU sdiff

Minor pagefaults - 181

BSD sdiff

Minor pagefaults - 232

diff3

SummerOfCode2012/JesseHagewood (last edited 2012-08-18 23:27:31 by JesseHagewood)