Debugging of suspected ZFS deadlocks

Please first read the following. Please direct your ZFS reports to freebsd-fs@FreeBSD.org.


The best first step is to capture stack traces of all threads in one of the following ways:


If you see thread(s) with zio_wait call in their stacks and you also see thread(s) with zio_done call in their stacks, then this is very likely a true ZFS deadlock. Please report.


Similarly, if you see thread(s) with zio_wait call in their stacks and you also see thread(s) with zio_interrupt call in their stacks, then this is very likely a true ZFS deadlock. Please report.


If you do not see any threads with zio_wait call, but you see threads with the following calls (or similar):

then this is very likely a true ZFS deadlock. Please report.


If neither of the above is true. That is, you do see zio_wait and you don't see either of zio_done or zio_interrupt, then the problem is most likely with the storage layer:

Consider reporting this problem. Please be realistic about the problem. Do not expect a resolution in ZFS code.


Some notes:

If you are into deep debugging some very interesting/useful information can be seen in vdev_t structures associated with each leaf vdev of a pool.

vdev_queue = {
        vq_deadline_tree = {avl_root = 0xfffffe0338dbb248, avl_compar =
0xffffffff816855b0 <vdev_queue_deadline_compare>,
avl_offset = 584, avl_numnodes = 116, avl_size = 896},
        vq_read_tree = {avl_root = 0xfffffe019d0b65b0, avl_compar =
0xffffffff81685600 <vdev_queue_offset_compare>, avl_offset = 560, avl_numnodes =
8, avl_size = 896},
        vq_write_tree = { avl_root = 0xfffffe03e3d19230, avl_compar =
0xffffffff81685600 <vdev_queue_offset_compare>, avl_offset = 560, avl_numnodes =
108, avl_size = 896},
        vq_pending_tree = {avl_root = 0xfffffe025e32c230, avl_compar =
0xffffffff81685600 <vdev_queue_offset_compare>, avl_offset = 560, avl_numnodes =
10, avl_size = 896},

avl_numnodes provides a number of requests (zio-s) in a given queue. vq_deadline_tree is a queue of incoming requests, vq_read_tree and vq_write_tree are sub-queues for read and write requests correspondingly. vq_pending_tree is a queue of requests that have been issued to the underlying storage layer, ZFS is waiting for these requests to be completed.

AndriyGapon/AvgZfsDeadlockDebug (last edited 2016-07-21 11:04:34 by KubilayKocak)