Fix out-of-order ZIL txtype lost on hardlinked files; Fix zil replay panic when TX_REMOVE followed by TX_CREATE
Review Request #2445 — Created Nov. 7, 2019 and submitted — Latest diff uploaded
Information | |
---|---|
citrus | |
illumos-gate | |
master | |
11942, 11943 | |
814c7e8... | |
Reviewers | |
general | |
jjelinek |
This is a combination of the following commits from ZoL, plus some additional
fixes for the associated tests for the illumos environment.commit 8e556c5ebc7b66caf2cdcc561b6644f9f8437a6d
Author: Chunwei Chen <david.chen@nutanix.com>
Date: Tue Aug 13 20:21:27 2019 -0700Fix out-of-order ZIL txtype lost on hardlinked files We should only call zil_remove_async when an object is removed. However, in current implementation, it is called whenever TX_REMOVE is called. In the case of hardlinked file, every unlink will generate TX_REMOVE and causing operations to be dropped even when the object is not removed. We fix this by only calling zil_remove_async when the file is fully unlinked.commit 035e96118bc9a7cbf435dd17dda507b870fcf6e6
Author: Chunwei Chen <david.chen@nutanix.com>
Date: Wed Aug 28 10:42:02 2019 -0700Fix zil replay panic when TX_REMOVE followed by TX_CREATE If TX_REMOVE is followed by TX_CREATE on the same object id, we need to make sure the object removal is completely finished before creation. The current implementation relies on dnode_hold_impl with DNODE_MUST_BE_ALLOCATED returning ENOENT. While this check seems to work fine before, in current version it does not guarantee the object removal is completed. We fix this by checking if DNODE_MUST_BE_FREE returns successful instead. Also add test and remove dead code in dnode_hold_impl.commit 97c54ea818ac60b914d1591e17ab175d89410b1b
Author: Ryan Moeller <ryan@freqlabs.com>
Date: Thu Aug 22 20:26:51 2019 -0400Make slog test setup more robust The slog tests fail when attempting to create pools using file vdevs that already exist from previous test runs. Remove these files in the setup for the test.Prior to this change, the first new test fails and the second causes a system panic:
panic[cpu7]/thread=fffffe16f0f4fb40: assertion failed: dmu_object_claim_dnsize(zfsvfs->z_os, obj, DMU_OT_PLAIN_FILE_CONTENTS, 0, obj_type, bonuslen, dnodesize, tx) == 0 (0x1c == 0x0), file: ../../common/fs/zfs/zfs_znode.c, line: 861 fffffe002258a280 genunix:process_type+153649 () fffffe002258a400 zfs:zfs_mknode+7e0 () fffffe002258a540 zfs:zfs_create+6fa () fffffe002258a5e0 genunix:fop_create+cf () fffffe002258a7a0 zfs:zfs_replay_create+2b8 () fffffe002258a800 zfs:zil_replay_log_record+f2 () fffffe002258a9d0 zfs:zil_parse+1f8 () fffffe002258aa50 zfs:zil_replay+bc () fffffe002258aa90 zfs:zfsvfs_setup+bd () fffffe002258ab10 zfs:zfs_domount+171 () fffffe002258ac30 zfs:zfs_mount+2a7 () fffffe002258ac60 genunix:fsop_mount+14 () fffffe002258add0 genunix:domount+952 () fffffe002258ae70 genunix:mount+fe () fffffe002258aeb0 genunix:syscall_ap+98 () fffffe002258af10 unix:brand_sys_sysenter+1dc ()Following the change, both tests pass:
Test: /opt/zfs-tests/tests/functional/slog/slog_replay_fs_001 (run as root) [00:11] [PASS] Test: /opt/zfs-tests/tests/functional/slog/slog_replay_fs_002 (run as root) [00:22] [PASS]