Thursday, September 21, 2006

ZFS in High Availability Environments

I see that many people are asking about ZFS + Sun Cluster solution. Soon Sun Cluster 3.2 should be released which does support ZFS (among many other new features). Now Solaris 10 is free, Sun Cluster is free. Additionally to install Sun Cluster it's just some clicks in GUI installer and voila! Then some other commands and we have ZFS pool under Sun Cluster management.
Below example (using new SC32 commands, old one are also available for backward compatibility) how to configure 2-node HA-NFS cluster with ZFS - as you can see it's really quick&easy.



Nodes: nfs-1 nfs-2
ZFS pool: files

# clresourcegroup create -n nfs-1,nfs-2 -p Pathprefix=/files/conf/ nfs-files
# clreslogicalhostname create -g nfs-files -h nfs-1 nfs-files-net
# clresourcetype register SUNW.HAStoragePlus
# clresource create -g nfs-files -t SUNW.HAStoragePlus -x Zpools=files nfs-files-hastp
# clresourcegroup online -e -m -M nfs-files
# mkdir /files/conf/SUNW.nfs
# vi /files/conf/SUNW.nfs/dfstab.nfs-files-shares
[put nfs shares here related to pool files]
# clresourcetype register SUNW.nfs
# clresource create -g nfs-files -t SUNW.nfs -p Resource_dependencies=nfs-files-hastp nfs-files-shares


ps. right now it's available as Sun Cluster 3.2 beta - I have already two SC32 beta clusters running with ZFS and must say it just works - there were so minor problems at the beginning but developers from Sun Cluster team helped so fast that I'm still impressed - thank you guys! Right now it works perfectly.

Wednesday, September 06, 2006

Tuesday, September 05, 2006

How much memory does ZFS consume?

When using ZFS standard tools give inaccurate values for free memory as ZFS doesn't use normal page cache and rather allocates directly kernel memory. When low-memory condition occurs ZFS should free its buffer memory. So how to get how much additional memory is possibly free?


bash-3.00# mdb -k
Loading modules: [ unix krtld genunix specfs dtrace cpu.AuthenticAMD.15 ufs md ip sctp usba fcp fctl lofs zfs random nfs crypto fcip cpc logindmux ptm ipc ]
> ::memstat
Page Summary Pages MB %Tot
------------ ---------------- ---------------- ----
Kernel 859062 3355 41%
Anon 675625 2639 32%
Exec and libs 7994 31 0%
Page cache 39319 153 2%
Free (cachelist) 110881 433 5%
Free (freelist) 385592 1506 19%

Total 2078473 8119
Physical 2049122 8004
>

bash-3.00# echo "::kmastat"|mdb -k|grep zio_buf|awk 'BEGIN {c=0} {c=c+$5} END {print c}'
2923298816


So kernel consumes about 3.2TB of memory and about 2.7GB is allocated to ZFS buffers and basically should be treated as free memory. Approximately free memory on this host is: Free (cachelist) + Free (freelist) + 2923298816.

I guess small script which do all the calculations automatically would be useful.

Wednesday, August 23, 2006

Sun gains market share

Sun recoups server martek share:
"The Santa Clara, Calif.-based company's server revenue rose 15.5 percent to $1.59 billion in the quarter, according to statistics from research firm IDC. The increase outpaced the overall growth of 0.6 percent to $12.29 billion worldwide, with faster gains in x86 servers, blade servers and lower-end models costing less than $25,000.

Sun's three main rivals fared worse. In contrast, IBM's revenue dropped 2.2 percent to $3.42 billion; Hewlett-Packard's dropped 1.7 percent to $3.4 billion; and Dell's dropped 1.3 percent to $1.27 billion."

Wednesday, August 16, 2006

New servers from Sun

New HW from Sun:

  • US IV+ 1.8GHz
    • available in v490 an up
    • looks like it beats latest IBM's POWER5+ CPUs
  • X2100 M2 server
    • comparing to standard x2100 server it has latest 1200's Opterons, DDR2-667, 4x GbE
  • X2200 M2 server
    • 2x 2000s Opterons (dual-core), 64GB memory supported, 4x GbE, LOM, 2x HDD
  • Ultra 20 M2 workstation
    • comparing to U20 it has latest Opterons, 2x GbE, DDR2-667, better video

Sun's new servers page.
Official Sun announcement.
Related story.

Tuesday, August 08, 2006

HW RAID vs. ZFS software RAID - part II

This time I tested RAID-5 performance. I used the same hardware as in last RAID-10 benchmark.
I created RAID-5 volume consisting 6 disks on a 3510 head unit with 2 controllers, using random optimization. I also created software RAID-5 (aka RAID-Z) group using ZFS on 6 identical disks in a 3510 JBOD. Both HW and SW RAIDs were connected to the same host (v440). Using filebench's varmail test below are the results.

These tests show that software RAID-5 in ZFS can not only be as fast as hardware RAID-5 it can even be faster. The same is with RAID-10 - ZFS software RAID-10 was faster than hardware RAID-10.

Please note that I tested HW RAID on a 3510 FC array not on some junky PCI RAID card.


1. ZFS on HW RAID5 with 6 disks, atime=off
IO Summary: 444386 ops 7341.7 ops/s, (1129/1130 r/w) 36.1mb/s, 297us cpu/op, 6.6ms latency
IO Summary: 438649 ops 7247.0 ops/s, (1115/1115 r/w) 35.5mb/s, 293us cpu/op, 6.7ms latency

2. ZFS with software RAID-Z with 6 disks, atime=off
IO Summary: 457505 ops 7567.3 ops/s, (1164/1164 r/w) 37.2mb/s, 340us cpu/op, 6.4ms latency
IO Summary: 457767 ops 7567.8 ops/s, (1164/1165 r/w) 36.9mb/s, 340us cpu/op, 6.4ms latency

3. there's some problem in snv_44 with UFS so UFS test is on S10U2 in test #4
4. UFS on HW RAID5 with 6 disks, noatime, S10U2 + patches (the same filesystem mounted as in 3)
IO Summary: 393167 ops 6503.1 ops/s, (1000/1001 r/w) 32.4mb/s, 405us cpu/op, 7.5ms latency
IO Summary: 394525 ops 6521.2 ops/s, (1003/1003 r/w) 32.0mb/s, 407us cpu/op, 7.7ms latency

5. ZFS with software RAID-Z with 6 disks, atime=off, S10U2 + patches (the same disks as in test #2)
IO Summary: 461708 ops 7635.5 ops/s, (1175/1175 r/w) 37.4mb/s, 330us cpu/op, 6.4ms latency
IO Summary: 457649 ops 7562.1 ops/s, (1163/1164 r/w) 37.0mb/s, 328us cpu/op, 6.5ms latency


See my post on zfs-discuss@opensolaris.org list for more details.


I have also found some benchmarks comparing ZFS, UFS, RAISERFS and EXT3 - ZFS was of course the fastest one on the same x86 hardware. See here and here.

DTrace in Mac OS

Thanks to Alan Coopersmith I've just learned that DTrace will be part of MacOS X Leopard.

Mac OS Leopard Xcode:

Track down problems

When you need a bit more help in debugging, Xcode 3.0 offers an extraordinary new program, Xray. Taking its interface cues from timeline editors such as GarageBand, now you can visualize application performance like nothing you’ve seen before. Add different instruments so you can instantly see the results of code analyzers. Truly track read/write actions, UI events, and CPU load at the same time, so you can more easily determine relationships between them. Many such Xray instruments leverage the open source DTrace, now built into Mac OS X Leopard. Xray. Because it’s 2006.


btw: such a GUI tool would be useful for many Solaris admins too

Monday, August 07, 2006

HW RAID vs. ZFS software RAID

I used 3510 head unit with 73GB 15K disks, RAID-10 made of 12 disks in one enclosure.
On the other server (the same server specs) I used 3510 JBODs with the same disk models.

I used filebench to generate workloads. "varmail" workload was used for 60s, two runs for each config.


1. ZFS filesystem on HW lun with atime=off:

IO Summary: 499078 ops 8248.0 ops/s, (1269/1269 r/w) 40.6mb/s, 314us cpu/op, 6.0ms latency
IO Summary: 503112 ops 8320.2 ops/s, (1280/1280 r/w) 41.0mb/s, 296us cpu/op, 5.9ms latency

2. UFS filesystem on HW lun with maxcontig=24 and noatime:

IO Summary: 401671 ops 6638.2 ops/s, (1021/1021 r/w) 32.7mb/s, 404us cpu/op, 7.5ms latency
IO Summary: 403194 ops 6664.5 ops/s, (1025/1025 r/w) 32.5mb/s, 406us cpu/op, 7.5ms latency

3. ZFS filesystem with atime=off with ZFS raid-10 using 12 disks from one enclosure:
IO Summary: 558331 ops 9244.1 ops/s, (1422/1422 r/w) 45.2mb/s, 312us cpu/op, 5.2ms latency
IO Summary: 537542 ops 8899.9 ops/s, (1369/1369 r/w) 43.5mb/s, 307us cpu/op, 5.4ms latency


In other tests HW vs. ZFS software raid show about the same performance.
So it looks like at least in some workloads software ZFS raid can be faster than HW raid.
Also please notice that HW raid was done on real HW array and not some crappy PCI raid card.

For more details see my post on ZFS discuss list.

Thursday, August 03, 2006

Solaris Internals

Finally both books of new Solaris Internals are available. It's must buy for everyone seriously using Solaris.
See here and here.

Thursday, July 27, 2006

Home made Thumper?

Or rather not? See this blog entry and learn what is so different about Thumper. I can't wait I get one for testing. It could be just great architecture for NFS servers.

Saturday, July 22, 2006

New workstation from Sun?

Looks like we can expect new workstation from Sun soon. Look at BugID 6444550: "Next month, Munich workstation will be shipped."

Friday, July 21, 2006

UNIX DAYS - Gdansk 2006

My ZFS presentation and my Open Solaris presentation from last Unix Days. These presentations are in English. You can download there also other presentations from Unix Days however some of them are in Polish.

Thursday, July 20, 2006

ZFS would have saved a day

Ehhh... sometimes everything just crashes and then all you can do is to wait MANY hours for fsck, then again for fsck... well ZFS probably would have help here or maybe not as it's new technology and other problems could have aroused. Anyway we'll put it in a test as we're using ZFS more and more and someday we'll know :) This time famous 'FSCK YOU' hit me :(

Monday, July 17, 2006

Xen dom0 on Open Solaris

Open Solaris gets Xen dom0 support.
I haven't played with it yet but it looks like 32/64 bit is supported, MP (up-to 32-way) is supported, domU for Open Solaris/Linux, live migration - well lots of work in a short time. More details at Open Solaris Xen page. Some behind scene blog entry.