AndriyGapon/AvgZFSLocking

The Lock

spa_namespace_lock is the lock.

Let's first look at the comments describing it: Code Snippet

Some immediate observations:

the lock covers a lot: from simple things like manipulations of a spa collection to covering complex operations on vdevs
needless uses of the lock are avoided by applying discipline: keep and use a reference to a spa that always must have instead of looking it up by name

And here is the first trouble with the lock, right in the core code where it belongs: Code Snippet

Hmm, not so good if the code that does spa opening gets recursed into. Let's why it happens. spa_open_common may call the following chain: spa_load_best => spa_load => spa_load_impl. In the latter function there is a call to dmu_objset_find: (void) dmu_objset_find(spa_name(spa), ....

Wow, we have a spa reference, in fact we are still constructing the spa, but call an external function with a spa name instead of the spa itself. It's not hard to follow through the code that the name is eventually resolved to the spa_t again. Via a recursive spa_open call.

In fact, there is a function similar to dmu_objset_find but with a signature that takes a spa_t pointer. But the function takes more arguments, a callback function that it takes also has a different signature.

This brings us to the following further two observations:

API that take a spa or a dataset names is often more convenient to use than API that takes a spa or a dataset pointer
the code is littered with a lot of API calls that take a name and which have to do a name to object lookup when the object is readily available

So the picture is not as rosy as the spa_namespace_lock description would have us believe. The second snippet is a good illustration of that. But there could be other hidden problems with potential lock order reversal. There would be less reasons to be concerned if the lock served only for spa collection manipulations which would imply that it does not interact with other "external" locks. But it is not so. For example the lock is held while probing disks.

That operation is especially non-trivial if the involved disks are ZFS zvols. It is easy to observe the following lock order reversal.

spa_open takes spa_namespace_lock then calls all the way down to vdev_open, which through the storage subsystem calls into zvol_open which, in upstream code, takes zfsdev_state_lock. On the other hand, some zvol operations first take zfsdev_state_lock and then call some name-based DMU/DSL API function which has to perform a spa lookup which ends up in spa_open with spa_namespace_lock.

On recent FreeBSD all uses of zfsdev_state_lock are replaced with spa_namespace_lock and, to avoid spa_namespace_lock recursion, any use of pools on zvols is prohibited. That fixes the described LOR, but I am not happy. Unlike upstream, FreeBSD situation was/is aggravated by g_topology_lock that has to be taken in vdev operations called from spa_open under spa_namespace_lock. g_topology_lock is also taken in some zvol operations. The third lock in the mix doesn't add any clarity.

Possible solutions

what FreeBSD did, but functionality suffers
meticulous cleanup of calls that needlessly call name resolution
breaking up spa_namespace_lock into finer-grained locks thus separating spa collection manipulations from heavy-weight tasks like spa construction and vdev operations
make the lock recursive
- not available in upstream (Solaris, Illumos, and descendant/derivatives)
- maybe won't help on FreeBSD because of LORs with GEOM topology lock

CategoryZfs