10952 defer new resilvers and misc. resilver-related fixes

Review Request #1846 — Created May 17, 2019 and submitted

jjelinek
illumos-gate
general

Port from ZoL of
fa241660743 Add feature check for 'zpool resilver' command
5aa95ba0d35 Fix resilver writes in vdev_indirect_io_start
ab6a2b5cd7f ZTS: Improve zpool_scrub_004_pos reliability
612c4930dd2 Fix the spelling of deferred ???
cef48f14da6 Remove races from scrub / resilver tests
4021ba4cfaa Make vdev_set_deferred_resilver() recursive
8cb119e3dc0 Fix 2 small bugs with cached dsl_scan_phys_t
5e0bd0ae056 Fix issue with scanning dedup blocks as scan ends
b3d7725c943 Remove zfs_gitrev.h (this shouldn't be part of 80a91e74696)
80a91e74696 Defer new resilvers until the current one ends

Also include the zpool_scrub_print_repairing in the zfs-test runfiles. This test could not be enabled earlier since it depended on the sequential scrub work, which has now integrated.

One of the new resilver tests depends on the "reopen" test code, so I also included the test code from the following
d3f2cd7e3b7 Added no_scrub_restart flag to zpool reopen
I did not yet include the code itself from that new feature.

Because we now have the reopen test code, I also enabled the 'zpool_scrub_offline_device' which depends on the reopen test code.
This test did not work on illumos because of the Linux-specific 'awk' usage, so I rewrote the 'awk' command which checks for checksum errors to run on our awk.

zfs-test run. All of the new/updated tests included here pass, except the reopen tests are not actually enabled yet in any run file.

  • 0
  • 0
  • 0
  • 1
  • 1
Description From Last Updated
kkantor
  1. Looks good. I just had one question about the delay() call.

  2. What circumstances lead us to give Sun (Oracle) copyright for new files? Is it when the new file is identical to an existing file in the tree that has Sun copyright?

    1. I can't say in this specific case since this is what was done in ZoL. In general, if we copy an existing file with a Sun/Oracle copyright, then edit it, we would normally keep the original copyright and add ours as well.

  3. usr/src/uts/common/fs/zfs/dsl_scan.c (Diff revision 1)
     
     

    I think kernel tunables are supposed to be manually added to the zfs MDB module for the ::zfs_params dcmd, but it's already pretty out of date.

    1. Thanks, I'll take a look at bring mdb completely up-to-date in a separate bug.

  4. usr/src/uts/common/fs/zfs/dsl_scan.c (Diff revision 1)
     
     

    This approach seems strange to me. Is this feature (suspending scans) only used for the testing/debugging code? If that's the case this is fine.

    Where does the 'hz' value come from? Is that a global somewhere?

    1. I do think this is intended only for testing. This is from commit "Remove races from scrub / resilver tests" and here is the ZoL commit msg:

      Currently, several tests in the ZFS Test Suite that attempt to
      test scrub and resilver behavior occasionally fail. A big reason
      for this is that these tests use a combination of zinject and
      zfs_scan_vdev_limit to attempt to slow these operations enough
      to verify their test commands. This method works most of the time,
      but provides no guarantees and leads to flaky behavior. This patch
      adds a new tunable, zfs_scan_suspend_progress, that ensures that
      scans make no progress, guaranteeing that tests can be run without
      racing.

      'hz' is a kernel variable decalred in uts/common/conf/param.c. This is the system cycle rate (defaults to 100, but we run at 1000).

  5. 
      
kkantor
  1. Ship It!
  2. 
      
jjelinek
Review request changed

Status: Closed (submitted)

Loading...