This page is for sharing the design and progress of the project to add transaction support to FreeBSD.
Contents
Overview
The objective of this project is to allow a process to start a transaction that can be later committed or aborted. This would allow, for example, the painless rollback of a port installation that failed in the middle.
Block-Based Design
Transactions are based on snapshots. Each process gets an additional field, its transaction id, which gets as its value a pid-like identifier, and is inherited when a process forks. Nested transactions are currently not allowed. When a transaction begins, a snapshot file named after the transaction id is created on each of the specified filesystems. The snapshot code is modified so that when a block is overwritten,
- if the process modifying the block is not part of a transaction, then
- if the block has already been copied in the transaction's snapshot then
- the corresponding call is blocked, waiting for the transaction to complete,
- otherwise (or when the call unblocks) the block is written to the live filesystem, without making a copy of the old contents to the snapshot file,
- otherwise, the old block is copied to the snapshot file as usual.
- if the block has already been copied in the transaction's snapshot then
When the transaction terminates (through a commit or abort) the transaction id of all processes of that transaction is cleared. If the transaction is committed, the snapshot file is released and deleted and waiting processes are notified. If the transaction is aborted, then
- the snapshot is suspended,
- all the snapshot's blocks are copied to the live filesystem,
- the snapshot file is released, and
- then waiting processes are notified.
When a process whose process id is the same as its transaction id terminates, the process id of the first of the remaining transaction processes is selected as a new transaction id, and it is assigned to all remaining transaction processes. If not other transaction processes remain, then the transaction is aborted.
ZFS already provides support for rolling back to a snapshot (sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_objset.c:dmu_objset_rollback). This simplifies things, as what is needed is to block system calls from non-transaction processes that clash with data in the snapshot, and to exclude from the snapshot data from non-transacted processes. Furthermore the rollback must be performed on a mounted filesystem.
Three-Way Merge Design
A snapshot (A) is created at the start of the transaction. This is then cloned and the transaction process is chrooted to the clone. If the transaction is aborted the snapshot is simply deleted. To commit the transaction two more snapshots are performed: one from the clone (B) and one from the original filesystem (C). Then, a three-way merge between A B and C must be performed on C as an atomic operation. An option to the transaction commit call can specify whether the snapshots and the clone are to be deleted, and whether transactions with conflicts will save the conflicts to a file for the user to resolve or will fail.
API
System Level
- int transaction(int request, ...)
- TR_BEGIN, int pid, char **transacted_filesystems
- TR_COMMIT, int tid
- TR_ABORT, int tid
- All gated system calls can return E_TRVIOL (Transaction violation): a transaction attempted to modify data on a filesystem that is not under transaction control.
User level
transaction -b filesystem ... transaction {-c|-a} transaction-id
Progress
- Rough design overview (dds - 2007-06-22)
- API documentation, design refinement (dds - 2007-06-23)
Commit API skeleton (dds - 2007-09-18) Browse Perforce Project
TODO
- Any issues with deleted and allocated blocks?
- Nested transactions?
- Jails?
References
Marshall Kirk McKusick and George V. Neville-Neil. The Design and Implementation of the FreeBSD Operating System. Addison-Wesley, Reading, MA, 2004. Section 8.7 (pp. 349-358).
Design and Implementation of a Transaction-Based Filesystem on FreeBSD
A hybrid approach to optimistic file system directory tree synchronization
XML three-way merge as a reconciliation engine for mobile data
Bal - A Tool to Synchronize Document Collections Between Computers