ZFS File System (Introduction)
This chapter provides an overview of the ZFS file system and its features and benefits. This chapter also covers some basic terminology used throughout the rest of this book.
The following sections are provided in this chapter:
1.1. What's New in ZFS?
This section summarizes new features in the ZFS file system.
1.1.1. Using Cache Devices in Your ZFS Storage Pool
Solaris Express Developer Edition 1/08: In this Solaris release, you can create pool and specify cache devices, which are used to cache storage pool data.
Cache devices provide an additional layer of caching between main memory and disk. Using cache devices provide the greatest performance improvement for random read-workloads of mostly static content.
One or more cache devices can specified when the pool is created. For example:
# zpool create pool mirror c0t2d0 c0t4d0 cache c0t0d0 # zpool status pool pool: pool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM pool ONLINE 0 0 0 mirror ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 cache c0t0d0 ONLINE 0 0 0 errors: No known data errors
After cache devices are added, they gradually fill with content from
main memory. Depending on the size of your cache device, it could take over
an hour for them to fill. Capacity and reads can be monitored by using the
zpool iostat command as follows:
# zpool iostat -v pool 5
Cache devices can be added or removed from the pool after the pool is created.
For more information, see Creating a ZFS Storage Pool with Cache Devices and Adding and Removing Cache Devices to Your ZFS Storage Pool.
1.1.2. Enhancements to the zfs send Command
Solaris Express Developer Edition 1/08: This
release includes the following enhancements to the
zfs send command.
Send all incremental streams from one snapshot to a cumulative snapshot. For example:
# zfs list NAME USED AVAIL REFER MOUNTPOINT pool 428K 16.5G 20K /pool pool/fs 71K 16.5G 21K /pool/fs pool/fs@snapA 16K - 18.5K - pool/fs@snapB 17K - 20K - pool/fs@snapC 17K - 20.5K - pool/fs@snapD 0 - 21K - # zfs send -I pool/fs@snapA pool/fs@snapD > /snaps/fs@combo
Send all incremental snapshots between fs@snapA to fs@snapD to fs@combo.
Send an incremental stream from the origin snapshot to create a clone. The original snapshot must already exist on the receiving side to accept the incremental stream. For example:
# zfs send -I pool/fs@snap1 pool/clone@snapA > /snaps/fsclonesnap-I . . # zfs receive -F pool/clone < /snaps/fsclonesnap-I
Send a replication stream of all descendent file systems, up to the named snapshots. When received, all properties, snapshots, descendent file systems, and clones are preserved. For example:
zfs send -R pool/fs@snap > snaps/fs-R
For an extended example, see Examples—Sending and Receiving Complex ZFS Snapshot Streams.
Send an incremental replication stream.
zfs send -R -[iI] @snapA pool/fs@snapD
For an extended example, see Examples—Sending and Receiving Complex ZFS Snapshot Streams.
For more information, see Sending and Receiving Complex ZFS Snapshot Streams.
1.1.3. ZFS Quotas and Reservations for File System Data Only
Solaris Express Developer Edition 1/08: In addition to the existing ZFS quota and reservation features, this release includes dataset quotas and reservations that do not include descendents, such as snapshots and clones, in the space consumption accounting.
refquotaproperty limits the amount of space a dataset can consume. This property enforces a hard limit on the amount of space that can be used. This hard limit does not include space used by descendents, such as snapshots and clones.
refreservationproperty sets the minimum amount of space that is guaranteed to a dataset, not including its descendents.
For example, you can set a 10 Gbyte
studentA that sets a 10-Gbyte hard limit of referenced space.
For additional flexibility, you can set a 20-Gbyte quota that allows you to
# zfs set refquota=10g tank/studentA # zfs set quota=20g tank/studentA
For more information, see ZFS Quotas and Reservations.
1.1.4. ZFS File System Properties for the Solaris CIFS Service
Solaris Express Developer Edition 1/08: This release provides support for the Solaris Common Internet File System (CIFS) service. This product provides the ability to share files between Solaris and Windows or MacOS systems.
To facilitate sharing files between these systems by using the Solaris CIFS service, the following new ZFS properties are provided:
Case sensitivity support (
Non-blocking mandatory locks (
SMB share support (
Unicode normalization support (
UTF-8 character set support (
sharesmb property is available to
share ZFS files in the Solaris CIFS environment. More ZFS CIFS-related properties
will be available in an upcoming release. For information about using the
sharesmb property, see Sharing ZFS Files in a Solaris CIFS Environment.
In addition to the ZFS properties added for supporting
the Solaris CIFS software product, the
vscan property is
available for scanning ZFS files if you have a 3rd-party virus
1.1.5. ZFS Storage Pool Properties
Solaris Express Developer Edition 1/08: ZFS storage pool properties were introduced in an earlier release. This release provides for additional property information. For example:
# zpool get all users NAME PROPERTY VALUE SOURCE users size 16.8G - users used 217M - users available 16.5G - users capacity 1% - users altroot - default users health ONLINE - users guid 11063207170669925585 - users version 8 default users bootfs - default users delegation on default users autoreplace off default users temporary on local
For a description of these properties, see ZFS Pool Property Descriptions.
cachefileproperty – Solaris Express Developer Edition 1/08: This release provides the
cachefileproperty, which controls where pool configuration information is cached. All pools in the cache are automatically imported when the system boots. However, installation and clustering environments might need to cache this information in a different location so that pools are not automatically imported.
You can set this property to cache pool configuration in a different location that can be imported later by using the
ccommand. For most ZFS configurations, this property would not be used.
cachefileproperty is not persistent and is not stored on disk. This property replaces the
temporaryproperty that was used to indicate that pool information should not be cached in previous Solaris releases.
failmodeproperty – Solaris Express Developer Edition 1/08: This release provides the
failmodeproperty for determining the behavior of a catastrophic pool failure due to a loss of device connectivity or the failure of all devices in the pool. The
failmodeproperty can be set to these values:
panic. The default value is
wait, which means you must reconnect the device or replace a failed device and clear the error with the
failmodeproperty is set like other settable ZFS properties, which can be set either before or after the pool is created. For example:
# zpool set failmode=continue tank # zpool get failmode tank NAME PROPERTY VALUE SOURCE tank failmode continue local
# zpool create -o failmode=continue
For a description of all ZFS pool properties, see ZFS Pool Property Descriptions.
1.1.6. ZFS and File System Mirror Mounts
Solaris Express Developer Edition 1/08: In this Solaris release, NFSv4 mount enhancements are provided to make ZFS file systems more accessible to NFS clients.
When file systems are created on the NFS server, the NFS client can automatically discover these newly created file systems within their existing mount of a parent file system.
For example, if the server
neo already shares the
tank file system and client
zee has it mounted, /tank/baz is automatically visible on the client after it is created
on the server.
zee# mount neo:/tank /mnt zee# ls /mnt baa bar neo# zfs create tank/baz zee% ls /mnt baa bar baz zee% ls /mnt/baz file1 file2
1.1.7. ZFS Command History Enhancements (zpool history)
Solaris Express Developer Edition 9/07: The
zpool history command has been enhanced to provide the following
ZFS file system event information
-loption for displaying a long format that includes the user name, the hostname, and the zone in which the operation was performed
-ioption for displaying internal event information that can be used for diagnostic purposes
For example, the
zpool history command provides both
zpool command events and
zfs command events.
# zpool history users History for 'users': 2007-04-26.12:44:02 zpool create users mirror c0t8d0 c0t9d0 c0t10d0 2007-04-26.12:44:38 zfs create users/markm 2007-04-26.12:44:47 zfs create users/marks 2007-04-26.12:44:57 zfs create users/neil 2007-04-26.12:47:15 zfs snapshot -r users/home@yesterday 2007-04-26.12:54:50 zfs snapshot -r users/home@today 2007-04-26.13:29:13 zfs create users/snapshots 2007-04-26.13:30:00 zfs create -o compression=gzip users/snapshots 2007-04-26.13:31:24 zfs create -o compression=gzip-9 users/oldfiles 2007-04-26.13:31:47 zfs set copies=2 users/home 2007-06-25.14:22:52 zpool offline users c0t10d0 2007-06-25.14:52:42 zpool online users c0t10d0 2007-06-25.14:53:06 zpool upgrade users
-i option provides
internal event information. For example:
# zpool history -i . . . 2007-08-08.15:10:02 [internal create txg:348657] dataset = 83 2007-08-08.15:10:03 zfs create tank/mark 2007-08-08.15:27:41 [internal permission update txg:348869] ul$76928 create dataset = 5 2007-08-08.15:27:41 [internal permission update txg:348869] ul$76928 destroy dataset = 5 2007-08-08.15:27:41 [internal permission update txg:348869] ul$76928 mount dataset = 5 2007-08-08.15:27:41 [internal permission update txg:348869] ud$76928 create dataset = 5 2007-08-08.15:27:41 [internal permission update txg:348869] ud$76928 destroy dataset = 5 2007-08-08.15:27:41 [internal permission update txg:348869] ud$76928 mount dataset = 5 2007-08-08.15:27:41 zfs allow marks create,destroy,mount tank 2007-08-08.15:27:59 [internal permission update txg:348873] ud$76928 snapshot dataset = 5 2007-08-08.15:27:59 zfs allow -d marks snapshot tank
-l option provides
a long format. For example:
# zpool history -l tank History for 'tank': 2007-07-19.10:55:13 zpool create tank mirror c0t1d0 c0t11d0 [user root on neo:global] 2007-07-19.10:55:19 zfs create tank/cindys [user root on neo:global] 2007-07-19.10:55:49 zfs allow cindys create,destroy,mount,snapshot tank/cindys [user root on neo:global] 2007-07-19.10:56:24 zfs create tank/cindys/data [user cindys on neo:global]
For more information about using the
zpool history command,
see Identifying Problems in ZFS.
1.1.8. Upgrading ZFS File Systems (zfs upgrade)
Solaris Express Developer Edition 9/07: The
zfs upgrade command is included in this release to provide future
ZFS file system enhancements to existing file systems. ZFS storage pools have
a similar upgrade feature to provide pool enhancements to existing storage
# zfs upgrade This system is currently running ZFS filesystem version 2. The following filesystems are out of date, and can be upgraded. After being upgraded, these filesystems (and any 'zfs send' streams generated from subsequent snapshots) will no longer be accessible by older software versions. VER FILESYSTEM --- ------------ 1 datab 1 datab/users 1 datab/users/area51
File systems that are upgraded and any streams created from those
upgraded file systems by the
zfs send command are not accessible
on systems that are running older software releases.
However, no new ZFS file system upgrade features are provided in this release.
1.1.9. ZFS Delegated Administration
Solaris Express Developer Edition 9/07: In this release, you can delegate fine-grained permissions to perform ZFS administration tasks to non-privileged users.
You can use the
zfs allow and
zfs unallow commands
to grant and remove permissions.
You can modify the ability to use delegated administration with the
delegation property. For example:
# zpool get delegation users NAME PROPERTY VALUE SOURCE users delegation on default # zpool set delegation=off users # zpool get delegation users NAME PROPERTY VALUE SOURCE users delegation off local
By default, the
delegation property is enabled.
For more information, see ZFS Delegated Administration and zfs(1M).
1.1.10. Setting Up Separate ZFS Logging Devices
Solaris Express Developer Edition 9/07: The ZFS intent log (ZIL) is provided to satisfy POSIX requirements for synchronous transactions. For example, databases often require their transactions to be on stable storage devices when returning from a system call. NFS and other applications can also use fsync() to ensure data stability. By default, the ZIL is allocated from blocks within the main storage pool. However, better performance might be possible by using separate intent log devices in your ZFS storage pool, such as with NVRAM or a dedicated disk.
Log devices for the ZFS intent log are not related to database log files.
You can set up a ZFS logging device when the storage pool is created or after the pool is created. For examples of setting up log devices, see Creating a ZFS Storage Pool with Log Devices and Adding Devices to a Storage Pool.
You can attach a log device to an existing log device to create a mirrored log device. This operation is identical to attaching a device in a unmirrored storage pool.
Consider the following points when determining whether setting up a ZFS log device is appropriate for your environment:
Any performance improvement seen by implementing a separate log device depends on the device type, the hardware configuration of the pool, and the application workload. For preliminary performance information, see this blog:
Log devices can be unreplicated or mirrored, but RAIDZ is not supported for log devices.
If a separate log device is not mirrored and the device that contains the log fails, storing log blocks reverts to the storage pool.
Log devices can be added, replaced, attached, detached, and imported and exported as part of the larger storage pool. Currently, log devices cannot be removed.
The minimum size of a log device is the same as the minimum size of device in pool, which is 64 Mbytes. The amount of in-play data that might be stored on a log device is relatively small. Log blocks are freed when the log transaction (system call) is committed.
The maximum size of a log device should be approximately 1/2 the size of physical memory because that is the maximum amount of potential in-play data that can be stored. For example, if a system has 16 Gbytes of physical memory, consider a maximum log device size of 8 Gbytes.
1.1.11. Creating Intermediate ZFS Datasets
Solaris Express Developer Edition 9/07: You
can use the
-p option with the
zfs clone, and
zfs rename commands to quickly
create a non-existent intermediate dataset, if it doesn't already exist.
For example, create ZFS datasets (
datab storage pool.
# zfs list NAME USED AVAIL REFER MOUNTPOINT datab 106K 16.5G 18K /datab # zfs create -p -o compression=on datab/users/area51
If the intermediate dataset exists during the create operation, the operation completes successfully.
Properties specified apply to the target dataset, not to the intermediate datasets. For example:
# zfs get mountpoint,compression datab/users/area51 NAME PROPERTY VALUE SOURCE datab/users/area51 mountpoint /datab/users/area51 default datab/users/area51 compression on local
The intermediate dataset is created with the default mount point. Any additional properties are disabled for the intermediate dataset. For example:
# zfs get mountpoint,compression datab/users NAME PROPERTY VALUE SOURCE datab/users mountpoint /datab/users default datab/users compression off default
For more information, see zfs(1M).
1.1.12. ZFS Hotplugging Enhancements
Solaris Express Developer Edition 9/07: In this release, ZFS more effectively responds to devices that are removed and provides a mechanism to automatically identify devices that are inserted with the following enhancements:
You can replace an existing device with an equivalent device without having to use the
autoreplaceproperty controls automatic device replacement. If set to off, device replacement must be initiated by the administrator by using the
zpool replacecommand. If set to on, any new device, found in the same physical location as a device that previously belonged to the pool, is automatically formatted and replaced. The default behavior is off.
The storage pool state
REMOVEDis provided when a device or hot spare has been removed if the device was physically removed while the system was running. A hot-spare device is substituted for the removed device, if available.
If a device is removed and then inserted, the device is placed online. If a hot-spare was activated when the device is re-inserted, the spare is removed when the online operation completes.
Automatic detection when devices are removed or inserted is hardware-dependent and might not be supported on all platforms. For example, USB devices are automatically configured upon inserted. However, you might have to use the
-c configurecommand to configure a SATA drive.
Hot spares are checked periodically to make sure they are online and available.
For more information, see zpool(1M).
1.1.13. Recursively Renaming ZFS Snapshots (zfs rename r)
Solaris Express Developer Edition 5/07: You
can recursively rename all descendent ZFS snapshots by using the
For example, snapshot a set of ZFS file systems.
# zfs snapshot -r users/home@today # zfs list NAME USED AVAIL REFER MOUNTPOINT users 216K 16.5G 20K /users users/home 76K 16.5G 22K /users/home users/home@today 0 - 22K - users/home/markm 18K 16.5G 18K /users/home/markm users/home/markm@today 0 - 18K - users/home/marks 18K 16.5G 18K /users/home/marks users/home/marks@today 0 - 18K - users/home/neil 18K 16.5G 18K /users/home/neil users/home/neil@today 0 - 18K -
Then, rename the snapshots the following day.
# zfs rename -r users/home@today @yesterday # zfs list NAME USED AVAIL REFER MOUNTPOINT users 216K 16.5G 20K /users users/home 76K 16.5G 22K /users/home users/home@yesterday 0 - 22K - users/home/markm 18K 16.5G 18K /users/home/markm users/home/markm@yesterday 0 - 18K - users/home/marks 18K 16.5G 18K /users/home/marks users/home/marks@yesterday 0 - 18K - users/home/neil 18K 16.5G 18K /users/home/neil users/home/neil@yesterday 0 - 18K -
Snapshots are the only dataset that can be renamed recursively. For more information about snapshots, see Overview of ZFS Snapshots.
1.1.14. GZIP Compression is Available for ZFS
Solaris Express Developer Edition 5/07: In
this Solaris release, you can set
gzip compression on ZFS
file systems in addition to
lzjb compression. You can specify
gzip, the default, or
gzip-N, where N equals 1 through 9. For
# zfs create -o compression=gzip users/home/snapshots # zfs get compression users/home/snapshots NAME PROPERTY VALUE SOURCE users/home/snapshots compression gzip local # zfs create -o compression=gzip-9 users/home/oldfiles # zfs get compression users/home/oldfiles NAME PROPERTY VALUE SOURCE users/home/oldfiles compression gzip-9 local
For more information about setting ZFS properties, see Setting ZFS Properties.
1.1.15. Storing Multiple Copies of ZFS User Data
Solaris Express Developer Edition 5/07: As a reliability feature, ZFS file system metadata is automatically stored multiple times across different disks, if possible. This feature is known as ditto blocks.
In this Solaris release, you can specify that multiple copies of user
data is also stored per file system by using the
zfs set copies command.
# zfs set copies=2 users/home # zfs get copies users/home NAME PROPERTY VALUE SOURCE users/home copies 2 local
Available values are 1, 2, or 3. The default value is 1. These copies are in addition to any pool-level redundancy, such as in a mirrored or RAID-Z configuration.
The benefits of storing multiple copies of ZFS user data are as follows:
Improves data retention by allowing recovery from unrecoverable block read faults, such as media faults (bit rot) for all ZFS configurations.
Provides data protection even in the case where only a single disk is available.
Allows you to select data protection policies on a per-file system basis, beyond the capabilities of the storage pool.
Depending on the allocation of the ditto blocks in the storage pool, multiple copies might be placed on a single disk. A subsequent full disk failure might cause all ditto blocks to be unavailable.
You might consider using ditto blocks when you accidentally create a non-redundant pool and when you need to set data retention policies.
For a detailed description of how setting copies on a system with a single-disk pool or a multiple-disk pool might impact overall data protection, see this blog entry. For more information about setting ZFS properties, see Setting ZFS Properties.
1.1.16. Improved zpool status Output
Solaris Express 1/07: You
can use the
-v command to
display a list of files with persistent errors. Previously, you had to use
-inum command to identify the
filenames from the list of displayed inodes.
For more information about displaying a list of files with persistent errors, see Repairing a Corrupted File or Directory.
1.1.17. ZFS and Solaris iSCSI Improvements
Solaris Express, Developer Edition 2/07: In
this Solaris release, you can create a ZFS volume as a Solaris iSCSI target
device by setting the
shareiscsi property on the ZFS volume.
This method is a convenient way to quickly set up a Solaris iSCSI target.
# zfs create -V 2g tank/volumes/v2 # zfs set shareiscsi=on tank/volumes/v2 # iscsitadm list target Target: tank/volumes/v2 iSCSI Name: iqn.1986-03.com.sun:02:984fe301-c412-ccc1-cc80-cf9a72aa062a Connections: 0
After the iSCSI target is created, set up the iSCSI initiator. For information about setting up a Solaris iSCSI initiator, see Chapter 14, Configuring Solaris iSCSI Targets and Initiators (Tasks), in System Administration Guide: Devices and File Systems.
For more information about managing a ZFS volume as an iSCSI target, see Using a ZFS Volume as a Solaris iSCSI Target.
1.1.18. Sharing ZFS File System Enhancements
Solaris Express, Developer Edition 2/07: In
this Solaris release, the process of sharing file systems has been improved.
Although modifying system configuration files, such as /etc/dfs/dfstab,
is unnecessary for sharing ZFS file systems, you can use the
to manage ZFS share properties. The
sharemgr command enables
you to set and manage share properties on share groups. ZFS shares are automatically
designated in the
zfs share group.
As in previous releases, you can set the ZFS
on a ZFS file system to share a ZFS file system. For example:
# zfs set sharenfs=on tank/home
Or, you can use the new
to share a ZFS file system in the
zfs share group. For
# sharemgr add-share -s tank/data zfs # sharemgr show -vp zfs zfs nfs=() zfs/tank/data /tank/data /tank/data/1 /tank/data/2 /tank/data/3
Then, you can use the
sharemgr command to manage
ZFS shares. The following example shows how to use
nosuid property on the shared ZFS file systems.
You must preface ZFS share paths with
# sharemgr set -P nfs -p nosuid=true zfs/tank/data # sharemgr show -vp zfs zfs nfs=() zfs/tank/data nfs=(nosuid="true") /tank/data /tank/data/1 /tank/data/2 /tank/data/3
For more information, see sharemgr(1M).
1.1.19. ZFS Command History (zpool history)
Solaris Express 12/06: In
this Solaris release, ZFS automatically logs successful
zpool commands that modify pool state information. For example:
# zpool history History for 'newpool': 2007-04-25.11:37:31 zpool create newpool mirror c0t8d0 c0t10d0 2007-04-25.11:37:46 zpool replace newpool c0t10d0 c0t9d0 2007-04-25.11:38:04 zpool attach newpool c0t9d0 c0t11d0 2007-04-25.11:38:09 zfs create newpool/user1 2007-04-25.11:38:15 zfs destroy newpool/user1 History for 'tank': 2007-04-25.11:46:28 zpool create tank mirror c1t0d0 c2t0d0 mirror c3t0d0 c4t0d0
This features enables you or Sun support personnel to identify the exact set of ZFS commands that was executed to troubleshoot an error scenario.
You can identify a specific storage pool with the
zpool history command.
# zpool history newpool History for 'newpool': History for 'newpool': 2007-04-25.11:37:31 zpool create newpool mirror c0t8d0 c0t10d0 2007-04-25.11:37:46 zpool replace newpool c0t10d0 c0t9d0 2007-04-25.11:38:04 zpool attach newpool c0t9d0 c0t11d0 2007-04-25.11:38:09 zfs create newpool/user1 2007-04-25.11:38:15 zfs destroy newpool/user1
The features of the history log are as follows:
The log cannot be disabled.
The log is saved persistently on disk, which means the log is saved across system reboots.
The log is implemented as a ring buffer. The minimum size is 128 Kbytes. The maximum size is 32 Mbytes.
For smaller pools, the maximum size is capped at 1% of the pool size, where size is determined at pool creation time.
Requires no administration, which means tuning the size of the log or changing the location of the log is unnecessary.
zpool history command does not record
For more information about troubleshooting ZFS problems, see Identifying Problems in ZFS.
1.1.20. ZFS Property Improvements
ZFS xattr Property
Solaris Express 1/07: You
can use the
xattr property to disable or enable extended
attributes for a specific ZFS file system. The default value is on. For a
description of ZFS properties, see Introducing ZFS Properties.
ZFS canmount Property
Solaris Express 10/06: The
canmount property allows you to specify whether a dataset
can be mounted by using the
zfs mount command. For more
information, see The canmount Property.
ZFS User Properties
Solaris Express 10/06: In addition to the standard native properties that can either export internal statistics or control ZFS file system behavior, ZFS supports user properties. User properties have no effect on ZFS behavior, but you can use them to annotate datasets with information that is meaningful in your environment.
For more information, see ZFS User Properties.
Setting Properties When Creating ZFS File Systems
Solaris Express 10/06: In this Solaris release, you can set properties when you create a file system, in addition to setting properties after the file system is created.
The following examples illustrate equivalent syntax:
# zfs create tank/home # zfs set mountpoint=/export/zfs tank/home # zfs set sharenfs=on tank/home # zfs set compression=on tank/home
# zfs create -o mountpoint=/export/zfs -o sharenfs=on -o compression=on tank/home
1.1.21. Displaying All ZFS File System Information
Solaris Express 10/06: In
this Solaris release, you can use various forms of the
zfs get command
to display information about all datasets if you do not specify a dataset.
In previous releases, all dataset information was not retreivable with the
zfs get command.
# zfs get -s local all tank/home atime off local tank/home/bonwick atime off local tank/home/marks quota 50G local
1.1.22. New zfs receive F Option
Solaris Express 10/06: In
this Solaris release, you can use the new
-F option to the
zfs receive command to force a rollback of the file system to the
most recent snapshot before doing the receive. Using this option might be
necessary when the file system is modified between the time a rollback occurs
and the receive is initiated.
For more information, see Restoring a ZFS Snapshot.
1.1.23. Recursive ZFS Snapshots
Solaris Express 8/06: When
you use the
zfs snapshot command to create a file system
snapshot, you can use the
-r option to recursively create
snapshots for all descendent file systems. In addition, using the
recursively destroys all descendent snapshots when a snapshot is destroyed.
Recursive ZFS snapshots are created quickly as one atomic operation. The snapshots are created together (all at once) or not created at all. The benefit of atomic snapshots operations is that the snapshot data is always taken at one consistent time, even across descendent file systems.
For more information, see Creating and Destroying ZFS Snapshots.
1.1.24. Double Parity RAID-Z (raidz2)
Solaris Express 7/06: A
redundant RAID-Z configuration can now have either single- or double-parity,
which means that one or two device failures can be sustained respectively,
without any data loss. You can specify the
for a double-parity RAID-Z configuration. Or, you can specify the
raidz1 keyword for a single-parity RAID-Z
For more information, see Creating RAID-Z Storage Pools or zpool(1M).
1.1.25. Hot Spares for ZFS Storage Pool Devices
Solaris Express 7/06: The ZFS hot spares feature enables you to identify disks that could be used to replace a failed or faulted device in one or more storage pools. Designating a device as a hot spare means that if an active device in the pool fails, the hot spare automatically replaces the failed device. Or, you can manually replace a device in a storage pool with a hot spare.
For more information, see Designating Hot Spares in Your Storage Pool and zpool(1M).
1.1.26. Replacing a ZFS File System With a ZFS Clone (zfs promote)
Solaris Express 7/06: The
zfs promote command enables you to replace an existing ZFS file
system with a clone of that file system. This feature is helpful when you
want to run tests on an alternative version of a file system and then, make
that alternative version of the file system the active file system.
For more information, see Replacing a ZFS File System With a ZFS Clone and zfs(1M).
1.1.27. Upgrading ZFS Storage Pools (zpool upgrade)
Solaris Express 6/06: You
can upgrade your storage pools to a newer version to take advantage of the
latest features by using the
zpool upgrade command. In
zpool status command has been modified to
notify you when your pools are running older versions.
For more information, see Upgrading ZFS Storage Pools and zpool(1M).
If you want to use the ZFS Administration console on a system with a
pool from a previous Solaris release, make sure you upgrade your pools before
using the ZFS Administration console. To see if your pools need to be upgraded,
zpool status command. For information about the
ZFS Administration console, see ZFS Web-Based Management.
1.1.28. Using ZFS to Clone Non-Global Zones and Other Enhancements
Solaris Express 6/06: When
zonepath and the target
reside on ZFS and are in the same pool,
zoneadm clone now
automatically uses the ZFS clone feature to clone a zone. This enhancement
zoneadm clone will take a ZFS snapshot of the
zonepath and set up the target
The snapshot is named
a unique ID used to distinguish between multiple snapshots. The destination
zonepath is used to name the ZFS clone. A software
inventory is performed so that a snapshot used at a future time can be validated
by the system. Note that you can still specify that the ZFS
copied instead of the ZFS clone, if desired.
To clone a source zone multiple times, a new parameter added to
zoneadm allows you to specify that an existing snapshot should be used.
The system validates that the existing snapshot is usable on the target. Additionally,
the zone install process now has the capability to detect when a ZFS file
system can be created for a zone, and the uninstall process can detect when
a ZFS file system in a zone can be destroyed. These steps are then performed
automatically by the
Keep the following points in mind when using ZFS on a system with containers:
Do not use the ZFS snapshot features to clone a zone
Do not use a ZFS file system for a global zone root path or a non-global zone root path in the Solaris 10 releases. You can use ZFS as a zone root path in the Solaris Express releases, but keep in mind that patching or upgrading these zones is not supported.
For more information, see System Administration Guide: Virtualization Using the Solaris Operating System.
1.1.29. ZFS Backup and Restore Commands are Renamed
Solaris Express 5/06: In
this Solaris release, the
zfs backup and
zfs restore commands
are renamed to
zfs send and
zfs receive to
more accurately describe their function. The function of these commands is
to save and restore ZFS data stream representations.
For more information about these commands, see Saving and Restoring ZFS Data.
1.1.30. Recovering Destroyed Storage Pools
Solaris Express 5/06: This
release includes the
which enables you to recover pools that were previously destroyed with the
zpool destroy command.
For more information, see Recovering Destroyed ZFS Storage Pools.
1.1.31. ZFS is Integrated With Fault Manager
Solaris Express 4/06: This release includes the integration of a ZFS diagnostic engine that is capable of diagnosing and reporting pool failures and device failures. Checksum, I/O, device, and pool errors associated with pool or device failures are also reported.
The diagnostic engine does not include predictive analysis of checksum and I/O errors, nor does it include proactive actions based on fault analysis.
In the event of the ZFS failure, you might see a message similar to
the following from
SUNW-MSG-ID: ZFS-8000-D3, TYPE: Fault, VER: 1, SEVERITY: Major EVENT-TIME: Fri Mar 10 11:09:06 MST 2006 PLATFORM: SUNW,Ultra-60, CSN: -, HOSTNAME: neo SOURCE: zfs-diagnosis, REV: 1.0 EVENT-ID: b55ee13b-cd74-4dff-8aff-ad575c372ef8 DESC: A ZFS device failed. Refer to http://illumos.org/msg/ZFS-8000-D3 for more information. AUTO-RESPONSE: No automated response will occur. IMPACT: Fault tolerance of the pool may be compromised. REC-ACTION: Run 'zpool status -x' and replace the bad device.
By reviewing the recommended action, which will be to follow the more
specific directions in the
zpool status command, you will
be able to quickly identify and resolve the failure.
For an example of recovering from a reported ZFS problem, see Repairing a Missing Device.
1.1.32. New zpool clear Command
Solaris Express 4/06: This
release includes the
zpool clear command for clearing error
counts associated with a device or the pool. Previously, error counts were
cleared when a device in a pool was brought online with the
zpool online command. For more information, see
zpool(1M) and Clearing Storage Pool Devices.
1.1.33. Compact NFSv4 ACL Format
Solaris Express 4/06: In
this release, three NFSv4 ACL formats are available: verbose, positional,
and compact. The new compact and positional ACL formats are available to set
and display ACLs. You can use the
chmod command to set
all 3 ACL formats. You can use the
to display compact and positional ACL formats and the
-v command to display verbose ACL formats.
For more information, see Setting and Displaying ACLs on ZFS Files in Compact Format, chmod(1), and ls(1).
1.1.34. File System Monitoring Tool (fsstat)
Solaris Express 4/06: A
new file system monitoring tool,
fsstat, is available to
report file system operations. Activity can be reported by mount point or
by file system type. The following example shows general ZFS file system
$ fsstat zfs new name name attr attr lookup rddir read read write write file remov chng get set ops ops ops bytes ops bytes 7.82M 5.92M 2.76M 1.02G 3.32M 5.60G 87.0M 363M 1.86T 20.9M 251G zfs
For more information, see fsstat(1M).
1.1.35. ZFS Web-Based Management
Solaris Express 1/06: A web-based ZFS management tool is available to perform many administrative actions. With this tool, you can perform the following tasks:
Create a new storage pool.
Add capacity to an existing pool.
Move (export) a storage pool to another system.
Import a previously exported storage pool to make it available on another system.
View information about storage pools.
Create a file system.
Create a volume.
Take a snapshot of a file system or a volume.
Roll back a file system to a previous snapshot.
You can access the ZFS Administration console through a secure web browser at the following URL:
If you type the appropriate URL and are unable to reach the ZFS Administration console, the server might not be started. To start the server, run the following command:
# /usr/sbin/smcwebserver start
If you want the server to run automatically when the system boots, run the following command:
# /usr/sbin/smcwebserver enable
You cannot use the Solaris Management Console (
to manage ZFS storage pools or file systems.
You will not be able to manage ZFS file systems remotely with the ZFS Administration console because of a change in a recent Solaris release, which shutdown some network services automatically. Use the following command to enable these services:
# netservices open
1.2. What Is ZFS?
The ZFS file system is a revolutionary new file system that fundamentally changes the way file systems are administered, with features and benefits not found in any other file system available today. ZFS has been designed to be robust, scalable, and simple to administer.
1.2.1. ZFS Pooled Storage
ZFS uses the concept of storage pools to manage physical storage. Historically, file systems were constructed on top of a single physical device. To address multiple devices and provide for data redundancy, the concept of a volume manager was introduced to provide the image of a single device so that file systems would not have to be modified to take advantage of multiple devices. This design added another layer of complexity and ultimately prevented certain file system advances, because the file system had no control over the physical placement of data on the virtualized volumes.
ZFS eliminates the volume management altogether. Instead of forcing you to create virtualized volumes, ZFS aggregates devices into a storage pool. The storage pool describes the physical characteristics of the storage (device layout, data redundancy, and so on,) and acts as an arbitrary data store from which file systems can be created. File systems are no longer constrained to individual devices, allowing them to share space with all file systems in the pool. You no longer need to predetermine the size of a file system, as file systems grow automatically within the space allocated to the storage pool. When new storage is added, all file systems within the pool can immediately use the additional space without additional work. In many ways, the storage pool acts as a virtual memory system. When a memory DIMM is added to a system, the operating system doesn't force you to invoke some commands to configure the memory and assign it to individual processes. All processes on the system automatically use the additional memory.
1.2.2. Transactional Semantics
ZFS is a transactional file system, which means that the file system
state is always consistent on disk. Traditional file systems overwrite data
in place, which means that if the machine loses power, for example, between
the time a data block is allocated and when it is linked into a directory,
the file system will be left in an inconsistent state. Historically, this
problem was solved through the use of the
This command was responsible for going through and verifying file system state,
making an attempt to repair any inconsistencies in the process. This problem
caused great pain to administrators and was never guaranteed to fix all possible
problems. More recently, file systems have introduced the concept of journaling. The journaling process records action in a separate journal,
which can then be replayed safely if a system crash occurs. This process introduces
unnecessary overhead, because the data needs to be written twice, and often
results in a new set of problems, such as when the journal can't be replayed
With a transactional file system, data is managed using copy
on write semantics. Data is never overwritten, and any sequence
of operations is either entirely committed or entirely ignored. This mechanism
means that the file system can never be corrupted through accidental loss
of power or a system crash. So, no need for a
exists. While the most recently written pieces of data might be lost, the
file system itself will always be consistent. In addition, synchronous data
(written using the
O_DSYNC flag) is always guaranteed to
be written before returning, so it is never lost.
1.2.3. Checksums and Self-Healing Data
With ZFS, all data and metadata is checksummed using a user-selectable algorithm. Traditional file systems that do provide checksumming have performed it on a per-block basis, out of necessity due to the volume management layer and traditional file system design. The traditional design means that certain failure modes, such as writing a complete block to an incorrect location, can result in properly checksummed data that is actually incorrect. ZFS checksums are stored in a way such that these failure modes are detected and can be recovered from gracefully. All checksumming and data recovery is done at the file system layer, and is transparent to applications.
In addition, ZFS provides for self-healing data. ZFS supports storage pools with varying levels of data redundancy, including mirroring and a variation on RAID-5. When a bad data block is detected, ZFS fetches the correct data from another redundant copy, and repairs the bad data, replacing it with the good copy.
1.2.4. Unparalleled Scalability
ZFS has been designed from the ground up to be the most scalable file system, ever. The file system itself is 128-bit, allowing for 256 quadrillion zettabytes of storage. All metadata is allocated dynamically, so no need exists to pre-allocate inodes or otherwise limit the scalability of the file system when it is first created. All the algorithms have been written with scalability in mind. Directories can have up to 248 (256 trillion) entries, and no limit exists on the number of file systems or number of files that can be contained within a file system.
1.2.5. ZFS Snapshots
A snapshot is a read-only copy of a file system or volume. Snapshots can be created quickly and easily. Initially, snapshots consume no additional space within the pool.
As data within the active dataset changes, the snapshot consumes space by continuing to reference the old data. As a result, the snapshot prevents the data from being freed back to the pool.
1.2.6. Simplified Administration
Most importantly, ZFS provides a greatly simplified administration model. Through the use of hierarchical file system layout, property inheritance, and automanagement of mount points and NFS share semantics, ZFS makes it easy to create and manage file systems without needing multiple commands or editing configuration files. You can easily set quotas or reservations, turn compression on or off, or manage mount points for numerous file systems with a single command. Devices can be examined or repaired without having to understand a separate set of volume manager commands. You can take an unlimited number of instantaneous snapshots of file systems. You can backup and restore individual file systems.
ZFS manages file systems through a hierarchy that allows for this simplified management of properties such as quotas, reservations, compression, and mount points. In this model, file systems become the central point of control. File systems themselves are very cheap (equivalent to a new directory), so you are encouraged to create a file system for each user, project, workspace, and so on. This design allows you to define fine-grained management points.
1.3. ZFS Terminology
This section describes the basic terminology used throughout this book:
A 256-bit hash of the data in a file system block. The checksum capability can range from the simple and fast fletcher2 (the default) to cryptographically strong hashes such as SHA256.
A file system whose initial contents are identical to the contents of a snapshot.
For information about clones, see Overview of ZFS Clones.
A generic name for the following ZFS entities: clones, file systems, snapshots, or volumes.
Each dataset is identified by a unique name in the ZFS namespace. Datasets are identified using the following format:
Identifies the name of the storage pool that contains the dataset
Is a slash-delimited path name for the dataset object
Is an optional component that identifies a snapshot of a dataset
For more information about datasets, see Managing ZFS File Systems.
- file system
A dataset that contains a standard POSIX file system.
For more information about file systems, see Managing ZFS File Systems.
A virtual device that stores identical copies of data on two or more disks. If any disk in a mirror fails, any other disk in that mirror can provide the same data.
A logical group of devices describing the layout and physical characteristics of the available storage. Space for datasets is allocated from a pool.
For more information about storage pools, see Managing ZFS Storage Pools.
A virtual device that stores data and parity on multiple disks, similar to RAID-5. For more information about RAID-Z, see RAID-Z Storage Pool Configuration.
The process of transferring data from one device to another device is known as resilvering. For example, if a mirror component is replaced or taken offline, the data from the up-to-date mirror component is copied to the newly restored mirror component. This process is referred to as mirror resynchronization in traditional volume management products.
For more information about ZFS resilvering, see Viewing Resilvering Status.
A read-only image of a file system or volume at a given point in time.
For more information about snapshots, see Overview of ZFS Snapshots.
- virtual device
A logical device in a pool, which can be a physical device, a file, or a collection of devices.
For more information about virtual devices, see Identifying Virtual Devices in a Storage Pool.
A dataset used to emulate a physical device. For example, you can create an ZFS volume as a swap device.
For more information about ZFS volumes, see ZFS Volumes.
1.4. ZFS Component Naming Requirements
Each ZFS component must be named according to the following rules:
Empty components are not allowed.
Each component can only contain alphanumeric characters in addition to the following four special characters:
Pool names must begin with a letter, except for the following restrictions:
The beginning sequence
0-9] is not allowed
A name that begins with
spareis not allowed because these name are reserved.
In addition, pool names must not contain a percent sign (
Dataset names must begin with an alphanumeric character. Dataset names must not contain a percent sign (