Fix out-of-order ZIL txtype lost on hardlinked files; Fix zil replay panic when TX_REMOVE followed by TX_CREATE

Review Request #2445 - Created Nov. 7, 2019 and submitted

Information
Andy Fiddaman
illumos-gate
master
11942, 11943
814c7e8...
Reviewers
general
jjelinek

This is a combination of the following commits from ZoL, plus some additional
fixes for the associated tests for the illumos environment.

commit 8e556c5ebc7b66caf2cdcc561b6644f9f8437a6d
Author: Chunwei Chen <david.chen@nutanix.com>
Date: Tue Aug 13 20:21:27 2019 -0700

Fix out-of-order ZIL txtype lost on hardlinked files

We should only call zil_remove_async when an object is removed. However,
in current implementation, it is called whenever TX_REMOVE is called. In
the case of hardlinked file, every unlink will generate TX_REMOVE and
causing operations to be dropped even when the object is not removed.

We fix this by only calling zil_remove_async when the file is fully
unlinked.

commit 035e96118bc9a7cbf435dd17dda507b870fcf6e6
Author: Chunwei Chen <david.chen@nutanix.com>
Date: Wed Aug 28 10:42:02 2019 -0700

Fix zil replay panic when TX_REMOVE followed by TX_CREATE

If TX_REMOVE is followed by TX_CREATE on the same object id, we need to
make sure the object removal is completely finished before creation. The
current implementation relies on dnode_hold_impl with
DNODE_MUST_BE_ALLOCATED returning ENOENT. While this check seems to work
fine before, in current version it does not guarantee the object removal
is completed.

We fix this by checking if DNODE_MUST_BE_FREE returns successful
instead. Also add test and remove dead code in dnode_hold_impl.

commit 97c54ea818ac60b914d1591e17ab175d89410b1b
Author: Ryan Moeller <ryan@freqlabs.com>
Date: Thu Aug 22 20:26:51 2019 -0400

Make slog test setup more robust

The slog tests fail when attempting to create pools using file vdevs
that already exist from previous test runs. Remove these files in the
setup for the test.

Prior to this change, the first new test fails and the second causes a system panic:

panic[cpu7]/thread=fffffe16f0f4fb40: assertion failed: dmu_object_claim_dnsize(zfsvfs->z_os, obj, DMU_OT_PLAIN_FILE_CONTENTS, 0, obj_type, bonuslen, dnodesize, tx) == 0 (0x1c == 0x0), file: ../../common/fs/zfs/zfs_znode.c, line: 861

fffffe002258a280 genunix:process_type+153649 ()
fffffe002258a400 zfs:zfs_mknode+7e0 ()
fffffe002258a540 zfs:zfs_create+6fa ()
fffffe002258a5e0 genunix:fop_create+cf ()
fffffe002258a7a0 zfs:zfs_replay_create+2b8 ()
fffffe002258a800 zfs:zil_replay_log_record+f2 ()
fffffe002258a9d0 zfs:zil_parse+1f8 ()
fffffe002258aa50 zfs:zil_replay+bc ()
fffffe002258aa90 zfs:zfsvfs_setup+bd ()
fffffe002258ab10 zfs:zfs_domount+171 ()
fffffe002258ac30 zfs:zfs_mount+2a7 ()
fffffe002258ac60 genunix:fsop_mount+14 ()
fffffe002258add0 genunix:domount+952 ()
fffffe002258ae70 genunix:mount+fe ()
fffffe002258aeb0 genunix:syscall_ap+98 ()
fffffe002258af10 unix:brand_sys_sysenter+1dc ()

Following the change, both tests pass:

Test: /opt/zfs-tests/tests/functional/slog/slog_replay_fs_001 (run as root) [00:11] [PASS]
Test: /opt/zfs-tests/tests/functional/slog/slog_replay_fs_002 (run as root) [00:22] [PASS]

  
Jerry Jelinek
Andy Fiddaman
Jerry Jelinek
Toomas Soome
Andy Fiddaman
Review request changed

Status: Closed (submitted)

Loading...