Friday, May 28, 2010

DTrace TCP and UDP providers

Last night two new DTrace providers were integrated.
They should be available in a build 142 of Open Solaris.

PSARC 2010/106 DTrace TCP and UDP providers
"This case adds DTrace 'tcp' and 'udp' providers with probes
for send and receive events. These providers cover the TCP
and UDP protocol implementations in OpenSolaris respectively. In
addition the tcp provider contains probes for TCP state machine
transitions and significant events in connection processing
(connection request, acceptance, refusal etc). The udp provider
also contains probes which fire when a UDP socket is opened/closed.
This is intended for use by customers for network observability and
troubleshooting, and this work represents the second and third
components of a suite of planned providers for the network stack. The
first was described in PSARC/2008/302 DTrace IP Provider."

The tcp provider is described here:

...and the udp provider is described here:

Tuesday, May 04, 2010

ZFS - synchronous vs. asynchronous IO

Sometimes it is very useful to be able to disable a synchronous behavior of a filesystem. Unfortunately not all applications provide such functionality. With UFS many used fastfs from time to time, however the problem is that it can potentially lead to a filesystem corruption. In case of ZFS many people have been using an undocumented zil_disable tunable. While it can cause a data corruption from an application point of view it doesn't impact ZFS on-disk consistency. This is good as it makes the feature very useful, with a much smaller risk but can greatly improve a performance in some cases like database imports, nfs servers, etc. The problem with the tunable is that it is unsupported, has a server-wide impact and affects only newly mounted zfs filesystems while has an instant effect on zvols.

From time to time there were requests here and there to get it implemented properly in a fully supported way. I thought it might be a good opportunity to re-fresh my understanding of Open Solaris and ZFS internals so a couple of months ago I decided to implement it under: 6280630 zil synchronicity.
And it was a fun - I really enjoyed it. I spent most of the time trying to understand the interactions between ZIL/VNODE/VFS layers and the structure of ZFS code. I was already familiar with it to some extend as I contributed a code to ZFS in the past and I also do read the code from time to time when I do some performance tuning, etc. Once I understood what's going on there it was really easy to do the actual coding. Once I got a basic functionality working and I asked for a sponsor so it gets integrated. Tim Haley offered to sponsor me and help me to get it integrated. Couple of moths later, after a PSARC case, code reviews, email exchanges, testing it got finally integrated and should appear in build 140.

I would like to thank Tim Haley, Mark Musante and Neil Perin for all their comments, code reviews, testing, PSARC case handling, etc. It was a real pleasure to work with you.

PSARC/2010/108 zil synchronicity

ZFS datasets now have a new 'sync' property to control synchronous behavior.
The zil_disable tunable to turn synchronous requests into asynchronous requests (disable the ZIL) has been removed. For systems that use that switch on upgrade you will now see a message on booting:

sorry, variable 'zil_disable' is not defined in the 'zfs' module

Please update your system to use the new sync property.
Here is a summary of the property:


The options and semantics for the zfs sync property:

This is the default option. Synchronous file system transactions
(fsync, O_DSYNC, O_SYNC, etc) are written out (to the intent log)
and then secondly all devices written are flushed to ensure
the data is stable (not cached by device controllers).

For the ultra-cautious, every file system transaction is
written and flushed to stable storage by a system call return.
This obviously has a big performance penalty.

Synchronous requests are disabled. File system transactions
only commit to stable storage on the next DMU transaction group
commit which can be many seconds. This option gives the
highest performance. However, it is very dangerous as ZFS
is ignoring the synchronous transaction demands of
applications such as databases or NFS.
Setting sync=disabled on the currently active root or /var
file system may result in out-of-spec behavior, application data
loss and increased vulnerability to replay attacks.
This option does *NOT* affect ZFS on-disk consistency.
Administrators should only use this when these risks are understood.

The property can be set when the dataset is created, or dynamically, and will take effect immediately. To change the property, an administrator can use the standard 'zfs' command. For example:

# zfs create -o sync=disabled whirlpool/milek
# zfs set sync=always whirlpool/perrin

Have a fun!