Parallelization in the ports collection and pkgng utility

Short description

The FreeBSD Ports Collection has been a primary system for building and installing software on FreeBSD since FreeBSD 1.0. Nevertheless, it does not provide a safe way for building several ports simultaneously. Port’s dependencies are built sequentially. To maintain the system in consistent state while building several ports it is necessary to prevent concurrent access to shared files and directories from multiple processes. My approach is based on lock files, that serve both as barriers and critical section triggers for several concurrent processes. An important aspect of my project is dealing with various failures and unexpected terminations of port’s build process to avoid any deadlock situations and inconsistent state of the ports system. Firstly, my changes to the ports framework allow safe way to build and install several ports at the same time. Secondly, I designed a convenient approach for parallel port’s dependencies builds. The main aim of this project is to make system updates faster and easier. My modifications to the ports collection allow multicore servers to use all its potential both by installing several ports and several port’s dependencies simultaneously. Same goes for the Tinderbox and pointyhat systems used by port committers. Another benefit of my project is package building with pkgng, since the build systems can build packages in parallel.

This project is consists of several main parts:

1. Parallelization in the FreeBSD Ports Collection.

  1. Parallel installation of several ports at the same time.
  2. Parallel installation of the port's dependencies.

2. Parallelization in pkgng. Which involves parallel installation of several binary packages at the same time.

This project also considers that the first port which starts doing something gets priority and no other port is able to interrupt it.

The Ports Collection

General development approach

bsd.parallel.mk

/!\ Important

${_parv_WANT_PARALLEL_BUILD}

/!\ Important

"Infinity" loops

Termination of the process tree

Variables available to the user

Targets available to the user

Usage practice

Parallel installation of several ports

This section covers cases when a user have already called make build/install in one port's directory and decides to call make build/install in another port's directory, while the first port is still building.

Problems

1. Prevent concurrent access to shared directories.

==> It is necessary to prevent concurrent access to above mentioned files/dirs, so that each port's data will not be spoiled by other port.

2. Prevent parallel make install/build process of the same port or same port's dependency.

==> It is necessary to prevent port B from doing this stuff.

==> Is it possible to force port B to start installing another dependency instead of just waiting for port X to be installed?

==> What if port X fails to be installed no matter for what reason?

3. Port is unable to use a dependency.

==> Is it possible to determine what target was called in port's A directory?

==> Is port B responsible for calling make install after port A will be processed?

4. Redesign of Conflict checking.

==> Does port B consider port A as a conflict port, in spite of that port A is just evaluating fetch target?

Approach to solving

The main approach to prevent concurrent access to shared directories and files is to lock them. Only the first process which locked the file/dir will be able to implement a sequence of actions. As soon as the process ends its work with this file/dir, it is necessary to unlock it, so that other processes can use it. If a process determines that some directory is locked, it is necessary to wait for unlocking of this directory or act accordingly.

The most appropriate technique for this purpose is to use LOCK files. Thus if some directory contains a specific file it is assumed to be locked.

Locking technique

Directory locking

lockf(1) utility

Stalled locks

File locking

This section gives us a convenient approach to directory locking. But, obviously, just LOCKING of all shared directories while a port and all it's dependencies are being installed will prevent another parallel port installation from doing most of it's stuff. Thus it is necessary to find out what directories (in which stages) must be LOCKED and UNLOCKED. It is also necessary to make the LOCKING phases as short as possible to allow other parallel port installations to be as efficient as possible.

Port's directory locking

${PKG_DBDIR} locking

${DISTDIR} locking

${PORT_DBDIR} locking

Prevent parallel ''make install'' process of the same port or same port dependency

Port X fails to be installed no matter for what reason

Continuation of the DEFAULT sequence of targets

Redesign of Conflict checking

Parallel installation of port's dependencies

Problems

  1. Blocked processing of port's dependencies is inappropriate for parallel dependencies processing. ${deptype:L}-depends and lib-depends targets need redesign in a nonblocking manner to match parallel execution flow.

  2. Track processing of spawned dependencies builds. While spawning dependencies builds as sub  make  background processes parent make process refuses to track exit codes of its child processes.
    Moreover there may be several kinds of exit codes:

    • exit 0
    • exit codes that signal about execution errors ( <> 0 ).

    • exit codes that signal that processing of port's dependency was stopped be cause this dependency had already been locked ( ${_parv_MAKE_LOCK_EXIT_STATUS} ).

    And we have to act accordingly on each of this exit statuses.
  3. Blocked execution due to user interaction. Both, the  make options  command and ports that set INTERACTIVE will block for user input. For a parallelization approach, this currently hinders the execution flow of dependencies, since every dependency will block its execution when waiting for user input. Moreover, process is unable to interact with user, if it is spawned as background process.

  4. Redesign of MAKE output. Parallel processing of port's dependencies leads to mixture of output streams of several child processes, be cause they are directed to single terminal. It is necessary to provide user only with most important information about building process, so that a user does not get a sense of deadlock.

Aproach to solving

Non blocking processing of dependency build

Track processing of spawned dependencies builds

OPTIONS and INTERACTIVE targets approach

Recursive OPTIONS processing

Integration

Recursive license checking

Integration

Redesign of MAKE output

Degree of parallelization

non-default make targets and limitations

Problem

Limitations

The Code

https://socsvn.freebsd.org/socsvn/soc2012/scher/par_ports

pkgng utility

TODO

Test Plan

(List of steps you plan to use to test your work, as discussed with your mentor)

The Code

(Link to your code, for example https://socsvn.freebsd.org/socsvn/soc2012/username/ or a Perforce depot path)

Deliverables

Milestones

(5-10 milestones, with dates, indicating when you hope or expect to be able to complete features. This section is mandatory. Please negotiate these with your mentor to make sure you're not under- or over-estimating the amount of work to be done. Please also make sure the following four dates are included within your milestones)

SummerOfCode2012/Parallelization_in_the_ports_collection (last edited 2013-05-17T01:31:17+0000 by BryanDrewery)