Remove a couple of performance bottlenecks in nvme

Review Request #2401 — Created Oct. 17, 2019 and submitted — Latest diff uploaded


This started as an exercise to improve concurreny through blkdev ( ). Whilst testing that I discovered that the single taskq used to handle command completions is also a bottleneck ( ).

This fixes both and when using vdbench, shows significant performance benefits - again see for benchmark results.

The changes in blkdev are to provide multiple wait/runq per device.
Som of the changes in nvme are for the multiple blkdev queues, the majority are changing the single command completion taskq, to one taskq per completion queue.

I ran vdbench as a stress test and to confirm performance.