The Lock

spa_namespace_lock is the lock.

Let's first look at the comments describing it: Code Snippet

Some immediate observations:

And here is the first trouble with the lock, right in the core code where it belongs: Code Snippet

Hmm, not so good if the code that does spa opening gets recursed into. Let's why it happens. spa_open_common may call the following chain: spa_load_best => spa_load => spa_load_impl. In the latter function there is a call to dmu_objset_find:  (void) dmu_objset_find(spa_name(spa), .... 

Wow, we have a spa reference, in fact we are still constructing the spa, but call an external function with a spa name instead of the spa itself. It's not hard to follow through the code that the name is eventually resolved to the spa_t again. Via a recursive spa_open call.

In fact, there is a function similar to dmu_objset_find but with a signature that takes a spa_t pointer. But the function takes more arguments, a callback function that it takes also has a different signature.

This brings us to the following further two observations:

So the picture is not as rosy as the spa_namespace_lock description would have us believe. The second snippet is a good illustration of that. But there could be other hidden problems with potential lock order reversal. There would be less reasons to be concerned if the lock served only for spa collection manipulations which would imply that it does not interact with other "external" locks. But it is not so. For example the lock is held while probing disks.

That operation is especially non-trivial if the involved disks are ZFS zvols. It is easy to observe the following lock order reversal.

spa_open takes spa_namespace_lock then calls all the way down to vdev_open, which through the storage subsystem calls into zvol_open which, in upstream code, takes zfsdev_state_lock. On the other hand, some zvol operations first take zfsdev_state_lock and then call some name-based DMU/DSL API function which has to perform a spa lookup which ends up in spa_open with spa_namespace_lock.

On recent FreeBSD all uses of zfsdev_state_lock are replaced with spa_namespace_lock and, to avoid spa_namespace_lock recursion, any use of pools on zvols is prohibited. That fixes the described LOR, but I am not happy. Unlike upstream, FreeBSD situation was/is aggravated by g_topology_lock that has to be taken in vdev operations called from spa_open under spa_namespace_lock. g_topology_lock is also taken in some zvol operations. The third lock in the mix doesn't add any clarity.

Possible solutions

AndriyGapon/AvgZFSLocking (last edited 2016-07-21 11:02:37 by KubilayKocak)