Friday, January 05, 2007

ZFS vs VxFS

Dominic Kay has posted some nice benchmarks for ZFS vs VxFS (1 2). He also showed the difference between managing VxVM/VxFS and ZFS. Believe me ZFS is easiest enterprise Volume Manager + File System on the market. When you've got to create few dozen file systems on an array or you've got to create some file systems but you're really not sure how much space assign to each one then ZFS is your only friend.


ps. I don't know which Solaris version was used - and it can make some difference when it comes to ZFS

Wednesday, December 20, 2006

Solaris 10 users worlwide

See this map - cool. From Jonathan blog entry:
Each pink dot represents a connected Solaris 10 user - not a downloader, but an individual or machine (independent of who made the server) that connects back to Sun's free update service for revisions and patches - applied to an individual machine, or a global datacenter. This doesn't yet account for anywhere near all Solaris 10 downloads, as most administrators still choose to manage their updates through legacy, non-connected tools. But it's directionally interesting - and shows the value of leveraging the internet to meet customers (new and old).

Sun Cluster 3.2

Finally Sun Cluster 3.2 is out. You can download it here for free. Just to highlight it - SC3.2 supports ZFS so you can for example build HA-NFS with ZFS and it works like a charm - I've been running such configs for months now (with SC3.2 beta). Also I know people generally don't like to learn new CLIs but in case of new SC it's worth it - imho it's much nicer. Additionally thanks to Quorum server it's now possible to setup cluster without shared storage - could be useful sometimes. Documentation is available here.

NEW FEATURES

Ease of Use
* New Command Line Interfaces
* Oracle 10g improved integration and administration
* Agent configuration wizards
* Flexible IP address scheme

Higher Availability
* Cluster support for SMF services
* Quorum server
* Extended flexibility for fencing protocol
* Greater Flexibility
* Expanded support for Solaris Containers
* HA ZFS - agent support for Sun's new file system
* Extended support for Veritas software components

Better Operations and Administration
* Dual-partition software update
* Live upgrade
* Optional GUI installation

With Solaris Cluster Geographic Edition, new features include:
* Support for x64 platforms
* Support for EMC SRDF replication software

Solaris Cluster is supported on Solaris 9 9/05 and Solaris 10 11/06.

Tuesday, December 19, 2006

Sun Download Manager


Recently I've noticed in Sun Download Center that I can download files the old way using save as in a browser or I can use Sun Download Manager directly from web page as JavaWS - I tried it and I must say I really like it - you just check which files you want to download and start SDM (from web page) and files are immediately being downloaded. It offers retries, continue of retrieval not completely downloaded files, automatically unzipping zipped files, proxy servers. All of it is configurable of course.

However I have my wish list for SDM:

  • ability to download files in parallel (configurable how many streams)
  • ability to not only unzip files but also to automatically merge them (great for Solaris and/or SX downloads)
  • option to ask for download directory when new downloads are being added

Saturday, December 16, 2006

LISA - follow up

It was my first time at LISA conference and I must say I really enjoyed it. There were a lot of people (over 1100 according), almost all sessions I attended to were really good. Not all of them were strictly technical but they were both humorous and informative. I had also opportunity to talk to other admins from large data centers which is always great as you can verify what other smart people are doing in their environments, often much larger than yours, and compare to what you are doing. It's always good to see and hear what other smart people have to say. I hope I'll go to LISA next year :)

So after my short vacations and attending to LISA I'm full of energy :) Well, me and Andrzej decided to start thinking about next Unix Days. I guess I'll write something more about it later.

Availability Suite goes into Open Solaris

I was going thru several OpenSolaris mailing groups and spotted really great news on storage-discuss list - entire Availability Suite is going to be integrated into Open Solaris next month! It means it will be free of charge, source will be available, etc. For most people it means mature solution for remote replication (synchronous and asynchronous) on a block level. Below quoted post:

"[...]
As the Availability Suite Project & Technical Lead, I will take this
opportunity to say that in January '07, all of the Sun StorageTech
Availability Suite (AVS) software is going into OpenSolaris!

This will include both the Remote Mirror (SNDR) and Point-in-Time Copy
(II) software, which runs on OpenSolaris supported hardware platforms of
SPARC, x86 and x64.

AVS, being both file system and storage agnostic, makes AVS very capable
of replicating and/or taking snapshots of UFS, QFS, VxFS, ZFS, Solaris
support databases (Oracle, Sybase, etc.), contained on any of the
following types of storage: LUNs, SVM & VxVM volumes, lofi devices, even
ZFS's zvols. [...]"

"[...]
The SNDR portion of Availability Suite, is very capable of replicating
ZFS. Due to the nature of ZFS itself, the unit of replication or
snapshot is a ZFS storage pool, not a ZFS file system. The relationship
between the number of file systems in each storage pools is left to the
discretion of the system administrator, being 1-to-1 (like older file
systems), or many-to-1 (as is now possible with ZFS).

SNDR can replicate any number of ZFS storage pools, where each of the
vdevs in the storage pool (zpool status ), must be configured
under a single SNDR I/O consistency group. Once configured, the
replication of ZFS, like all other Solaris supported file systems, works
with both synchronous and asynchronous replication, the latter using
either memory queues or disks queues.

This product set is well documented and can seen at
http://docs.sun.com/app/docs?p=coll%2FAVS4.0
The current release notes for AVS 4.0 are located at
http://docs.sun.com/source/819-6152-10/AVS_40_Release_Notes.html

More details will be forthcoming in January, so please keep a look out
for Sun StorageTech Availability Suite in 2007![...]"


Entire thread here.

Monday, December 11, 2006

Friday, November 17, 2006

Vacation

I'm leaving for vacation finally :) Then directly from my vacation I go to LISA Tech Days, so see you there.

Thursday, November 16, 2006

ZFS RAID-Z2 Performance

While ZFS's RAID-Z2 can offer actually worse random read performance than HW RAID-5 it should offer much better write performance than HW RAID-5 especially when you are doing random writes or you are writing to lot of different files concurrently. After doing some tests I happily found it exactly works that way as expected. Now the hard question was: would RAID-Z2 be good enough in terms of performance in actual production environment? There's no simple answer as in a production we do actually see a mix of reads and writes. With HW RAID-5 when your write throughput is large enough its write cache can't help much and your write performance falls down dramatically with random writes. Also one write IO to an array is converted to several IOs - so you get less available IOs left for reads. ZFS RAID-Z and RAID-Z2 don't behave that way and give you excellent write performance whether it's random or not. It should also generate less write IOs per disk than HW RAID-5. So the true question is: will it offset enough to get better overall performance on a production?

After some testing I wasn't really closer to answer that question - so I decided on a pool configuration and other details and decided to put it in a production. The business comparison is that I need at least 2 HW RAID-5 arrays to carry our production traffic. One array just can't do it and main problem are writes. Well, only one x4500 with RAID-Z2 seems to do its job in the same environment without any problems - at least so far. It'll be interesting to see how it will behave with more and more data on it (only few TB's right now) as it will also mean more reads. But from what I've seen so far I'm optimistic.

ZFS RAID-Z2 Performance

While ZFS's RAID-Z2 can offer actually worse random read performance than HW RAID-5 it should offer much better write performance than HW RAID-5 especially when you are doing random writes or you are writing to lot of different files concurrently. After doing some tests I happily found it exactly works that way as expected. Now the hard question was: would RAID-Z2 be good enough in terms of performance in actual production environment? There's no simple answer as in a production we do actually see a mix of reads and writes. With HW RAID-5 when your write throughput is large enough its write cache can't help much and your write performance falls down dramatically with random writes. Also one write IO to an array is converted to several IOs - so you get less available IOs left for reads. ZFS RAID-Z and RAID-Z2 don't behave that way and give you excellent write performance whether it's random or not. It should also generate less write IOs per disk than HW RAID-5. So the true question is: will it offset enough to get better overall performance on a production?

After some testing I wasn't really closer to answer that question - so I decided on a pool configuration and other details and decided to put it in a production. The business comparison is that I need at least 2 HW RAID-5 arrays to carry our production traffic. One array just can't do it and main problem are writes. Well, only one x4500 with RAID-Z2 seems to do its job in the same environment without any problems - at least so far. It'll be interesting to see how it will behave with more and more data on it (only few TB's right now) as it will also mean more reads. But from what I've seen so far I'm optimistic.

Tuesday, November 14, 2006

Caiman

If you install Solaris on servers using jumpstart then you never actually see Solaris interactive installer. But more and more people are using Solaris on their desktops and laptops and often installer is their first contact with Solaris. And I must admit it's not a good one. Fortunately Sun realizes that and some time ago project Caiman was started to address this problem. See Caiman Architecture document and Install Strategy document. Also see early propositions of gui Caiman installer.

Friday, November 10, 2006

ZFS tuning

Recently 6472021 was integrated. If you want to tune ZFS here you can get a list of tunables. Some default values for tunables with short comments can be find here, here, and here.

St Paul Blade - Niagara blade from Sun in Q1/07

I was looking thru latest changes to Open Solaris and found this:


Date: Mon, 30 Oct 2006 19:45:33 -0800
From: Venkat Kondaveeti
To: onnv-gate at onnv dot sfbay dot sun dot com, on-all at sun dot com
Subject: Heads-up:St Paul platform support in Nevada

Today's putback for the following
PSARC 2006/575 St Paul Platform Software Support
6472061 Solaris support for St Paul platform

provides the St Paul Blade platform support in Nevada.
uname -i O/P for St Paul platform is SUNW,Sun-Blade-T6300.

The CRs aganist Solaris for St Paul Blade platform support
should be filed under platform-sw/stpaul/solaris-kernel in bugster.

If you're changing sun4v or Fire code, you'll want to test on St Paul.
You can get hold of one by contacting stpaul_sw at sun dot com alias with
"Subject: Need St Paul System Access" and blades
will be delivered to ON PIT and ON Dev on or about Feb'8th,2007.

St Paul eng team will provide the technical support.
Please send email to stpaul_sw at sun dot com if any issues.

FYI, StPaul is a Niagara-1 based, 1P, blade server designed exclusively
for use in the Constellation chassis (C-10). The blades are comprised of
an enclosed motherboard that hosts 1 system processor, 1 FIRE ASIC, 8
DIMMS,
4 disks, 2 10/100/1000Mbps Ethernet ports, 2 USB 2.0 ports and a Service
processor. Power supplies, fans and IO slots do not reside on the blade,
but instead exist as part of the C-10 chassis. Much of the blade design is
highly leveraged from the Ontario platform. St Paul RR date per plan is
03/2007.

Thanks

St Paul SW Development Team

Wednesday, November 08, 2006

ZFS saved our data

Recently we migrated Linux NFS server to Solaris 10 NFS server with Sun Cluster 3.2 and ZFS. System has connected 2 SCSI JBODs and each node has 2 SCSI adapters, RAID-10 between JBODs and SCSI adapters was created using ZFS. We did use rsync to migrate data. During migration we noticed in system logs that one of SCSI adapters reported some warnings from time to time. Then more serious warnings about bad firmware or broken adapter - but data kept writing. When we run rsync again ZFS reported some checksum errors but only on disks which were connected to bad adapter. I run scrub on entire pool and ZFS reported and corrected thousands of checksum errors - all of them on a bad controller. We removed bad controller and reconnected JBOD to good one, run scrub again - this time no errors. Then we completed data migration. So far everything works ok and no checksum error are reported by ZFS.

Important thing here is that ZFS detected that bad SCSI adapter was actually corrupting data and ZFS was able to correct that on-the-fly so we didn't have to start from the beginning. Also if it was classic file system we probably wouldn't have even notice that our data were corrupted until system panic or fsck needed. Also as there were so many errors probably fsck wouldn't help for file system consistency not to mention that it wouldn't correct bad data at all.

Friday, November 03, 2006

Thumper throughput

For some testing I'm creating right now 8 raid-5 devices under SVM with 128k interleave size. It's really amazing how much x4500 server can do in terms of throughput. Right now all those raid-5 volumes are generating above 2GB/s write throughput! Woooha! It can write more data to disks than most (all?) Intel servers can read or write to memory :))))


bash-3.00# metainit d101 -r c0t0d0s0 c1t0d0s0 c4t0d0s0 c6t0d0s0 c7t0d0s0 -i 128k
d101: RAID is setup
bash-3.00# metainit d102 -r c0t1d0s0 c1t1d0s0 c5t1d0s0 c6t1d0s0 c7t1d0s0 -i 128k
d102: RAID is setup
bash-3.00# metainit d103 -r c0t2d0s0 c1t2d0s0 c5t2d0s0 c6t2d0s0 c7t2d0s0 -i 128k
d103: RAID is setup
bash-3.00# metainit d104 -r c0t4d0s0 c1t4d0s0 c4t4d0s0 c6t4d0s0 c7t4d0s0 -i 128k
d104: RAID is setup
bash-3.00# metainit d105 -r c0t3d0s0 c1t3d0s0 c4t3d0s0 c5t3d0s0 c6t3d0s0 c7t3d0s0 -i 128k
d105: RAID is setup
bash-3.00# metainit d106 -r c0t5d0s0 c1t5d0s0 c4t5d0s0 c5t5d0s0 c6t5d0s0 c7t5d0s0 -i 128k
d106: RAID is setup
bash-3.00# metainit d107 -r c0t6d0s0 c1t6d0s0 c4t6d0s0 c5t6d0s0 c6t6d0s0 c7t6d0s0 -i 128k
d107: RAID is setup
bash-3.00# metainit d108 -r c0t7d0s0 c1t7d0s0 c4t7d0s0 c5t7d0s0 c6t7d0s0 c7t7d0s0 -i 128k
d108: RAID is setup
bash-3.00#


bash-3.00# iostat -xnzCM 1 | egrep "device| c[0-7]$"
[omitted first output as it's avarage since reboot]
extended device statistics
r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device
0.0 367.5 0.0 367.5 0.0 8.0 0.0 21.7 0 798 c0
0.0 389.5 0.0 389.5 0.0 8.0 0.0 20.5 0 798 c1
0.0 276.4 0.0 276.4 0.0 6.0 0.0 21.7 0 599 c4
5.0 258.4 0.0 258.4 0.0 6.0 0.0 22.9 0 602 c5
0.0 394.5 0.0 394.5 0.0 8.0 0.0 20.2 0 798 c6
0.0 396.5 0.0 396.5 0.0 8.0 0.0 20.1 0 798 c7
extended device statistics
r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device
0.0 376.0 0.0 376.0 0.0 8.0 0.0 21.2 0 798 c0
0.0 390.0 0.0 390.0 0.0 8.0 0.0 20.5 0 798 c1
0.0 281.0 0.0 281.0 0.0 6.0 0.0 21.3 0 599 c4
0.0 250.0 0.0 250.0 0.0 6.0 0.0 24.0 0 599 c5
0.0 392.0 0.0 392.0 0.0 8.0 0.0 20.4 0 798 c6
0.0 386.0 0.0 386.0 0.0 8.0 0.0 20.7 0 798 c7
extended device statistics
r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device
0.0 375.0 0.0 375.0 0.0 8.0 0.0 21.3 0 798 c0
0.0 407.0 0.0 407.0 0.0 8.0 0.0 19.6 0 798 c1
0.0 275.0 0.0 275.0 0.0 6.0 0.0 21.8 0 599 c4
0.0 247.0 0.0 247.0 0.0 6.0 0.0 24.2 0 599 c5
0.0 388.0 0.0 388.0 0.0 8.0 0.0 20.6 0 798 c6
0.0 382.0 0.0 382.0 0.0 8.0 0.0 20.9 0 798 c7
^C
bash-3.00# bc
376.0+390.0+281.0+250.0+392.0+386.0
2075.0