UPDATE: ZFS dedup finally integrated!
With integration of this RFE we are closer (hopefully) to ZFS buil-in de-duplication. Read more on Eric's blog.
Once ZFS re-writer and de-duplication are done in theory one should be able to do a zpool upgrade of current pool and de-dup all data which is already there... we will see :)
Eric mentioned on his blog that in reality we should use sha256 or stronger. I would go even further - two modes, one mode you depend entirely on block checksum and the other one where you actually compare byte-by-byte given block to be 100% sure they are the same. Slower for some workloads but safe.
Now, de-dup which "understands" your data would be even better (analyzing file contents - like emails, attachments and de-dup on attachment level, etc.), nevertheless block level one would be a good start.
Monday, March 31, 2008
Thursday, March 20, 2008
Solaris 8 in a Zone
Well, wha if you stuck with Solaris 8 for many reasons but need to get it on modern SPARC HW and better yet get it clustered? You also need MPxIO working with latest arrays, and if you could go with SC3.2 for free to limit costs that would be ideal...
Playground: 2x T5220, 1x 2530 SAS array, Solaris 10 U4, MPxIO, IPMP, Sun Cluster 3.2 with Zones agent, patch 126020-02 applied (support for Etude). Entire software is for free.
Then you install Etude - just two packages. Export your Solaris 8 root file system over nfs or create a flar archive. Now you create a Solaris Branded Zone with Solaris 8 emulation providing exported Solaris 8 or flar archive as a source and a moment later you have a working copy of your Solaris 8 system in a Solaris 10 zone - cool!
Now you configure that Zone under a cluster (couple of commands) and you got it clustered so you can switch that zone between systems.
So far so good. Next week more functional tests and some basic application testing. If it will go well then we will switch production to it.
Last phase? Create another Zone - this time Solaris 10 zone (standard one), put it under a cluster and migrate one by one applications between zones doing some cleaning at the same time.
In a mean time we will provide better reliability due to clustering, better performance due to faster storage, more RAM and more CPU power.
Not only it allows you to use recent HW and rapid migration, but since it's running on Solaris 10 you also benefit from technologies like ZFS (yes, etude zone can be on zfs), Dtrace, resource management, etc.
How hard is it to set-up? Actually very easy, way easier than you think.
Playground: 2x T5220, 1x 2530 SAS array, Solaris 10 U4, MPxIO, IPMP, Sun Cluster 3.2 with Zones agent, patch 126020-02 applied (support for Etude). Entire software is for free.
Then you install Etude - just two packages. Export your Solaris 8 root file system over nfs or create a flar archive. Now you create a Solaris Branded Zone with Solaris 8 emulation providing exported Solaris 8 or flar archive as a source and a moment later you have a working copy of your Solaris 8 system in a Solaris 10 zone - cool!
Now you configure that Zone under a cluster (couple of commands) and you got it clustered so you can switch that zone between systems.
So far so good. Next week more functional tests and some basic application testing. If it will go well then we will switch production to it.
Last phase? Create another Zone - this time Solaris 10 zone (standard one), put it under a cluster and migrate one by one applications between zones doing some cleaning at the same time.
In a mean time we will provide better reliability due to clustering, better performance due to faster storage, more RAM and more CPU power.
Not only it allows you to use recent HW and rapid migration, but since it's running on Solaris 10 you also benefit from technologies like ZFS (yes, etude zone can be on zfs), Dtrace, resource management, etc.
How hard is it to set-up? Actually very easy, way easier than you think.
ZFS Encryption
At yesterday's LOSUG Darren Moffat, Sun Senior Staff Engineer presented current status of ZFS encryption. It was really interesting presentation. He even managed to panic system :)
The good thing is it's going to be very easy to use and is going to be integrated relatively soon - IIRC about build 92. It was also nice to be able to talk to him after his presentation and share some thoughts.
If you are from London area I think it would be worthwhile to pop-in at LOSUG meeting - you can always learn something new or meet new people.
The good thing is it's going to be very easy to use and is going to be integrated relatively soon - IIRC about build 92. It was also nice to be able to talk to him after his presentation and share some thoughts.
If you are from London area I think it would be worthwhile to pop-in at LOSUG meeting - you can always learn something new or meet new people.
Tuesday, March 18, 2008
S10 & ZFS - important patch
If you are using ZFS on Solaris 10 and experiencing some problems you should be interested in 127729-07 (x86) and 127728-06 (SPARC). Fixes introduced in last revision:
Problem Description:
6355623 zfs rename to valid dataset name, but if snapshot name becomes too long, panics system
6393769 client panic with mutex_enter: bad mutex, at get_lock_list
6513209 destroying pools under stress causes hang in arc_flush
6523336 panic dr->dt.dl.dr_override_state == DR_NOT_OVERRIDDEN,
file: ../../ common/fs/zfs/dbuf.c line: 2195
6533813 recursive snapshotting resulted in bad stack overflow
6535160 lock contention on zl_lock from zil_commit
6544140 assertion failed: err == 0 (0x11 == 0x0), file: ../../common/fs/zfs/zfs_znode.c, line: 555
6549634 dn_dbfs_mtx should be held when calling list_link_active() in dbuf_destroy()
6557767 assertion failed: error == 17 || lr->lr_length <= zp->z_blksz
6565044 small race condition between zfs_umount() and ZFS_ENTER()
6565574 zvol read perf problem
6569719 panic dangling dbufs (dn=ffffffff28814d30, dbuf=ffffffff20756008)
6573361 panic turnstile_block, unowned mutex
6577156 zfs_putapage discards pages too easily
6581978 assertion failed: koff <= filesz, file: ../../common/fs/zfs/zfs_vnops.c, line: 2834
6585265 need bonus resize interface
6586422 deadlock occurs when nfsv4 recover thread calls nfs4_start_fop
6587723 BAD TRAP: type=e (#pf Page fault) occurred in module "zfs" due to NULL pointer dereference
6589799 dangling dbuf after zinject
6594025 panic: dangling dbufs during shutdown
6596239 stop issuing IOs to vdev that is going to be removed
6617844 seems bug 4901380 has not been fixed in Solaris 10
6618868 ASSERT: di->dr_txg == tx->tx_txg (0x148 == 0x147), dbuf.c, line 1088
6620864 BAD TRAP panic in vn_invalid() called through znode_pageout_func()
6637030 kernel heap corruption detected during stress
SAM-QFS Open Sourced
I'm really impressed with Sun when it comes to Open Source - Solaris, Sun Cluster, DTace, ZFS, ... and now SAM-QFS. Read More.
I really wonder if such a business model will prove - I guess we have to wait another couple of years to see.
Nevertheless it is not only that all these products are open sourced it is also that they are entirely for free which in many cases is disruptive. For example - I've been installing clusters where it did not make sense before, for example: 2x x86 servers + Solaris (free) + Sun Cluster (free) + MySQL (free). Or instead of spending too much money for NetApp you just go with 2x x86 + Solaris + Sun Cluster + ZFS + Comstar - you will save lot of money. However NetApp story is perhaps worth separate blog entry...
I really wonder if such a business model will prove - I guess we have to wait another couple of years to see.
Nevertheless it is not only that all these products are open sourced it is also that they are entirely for free which in many cases is disruptive. For example - I've been installing clusters where it did not make sense before, for example: 2x x86 servers + Solaris (free) + Sun Cluster (free) + MySQL (free). Or instead of spending too much money for NetApp you just go with 2x x86 + Solaris + Sun Cluster + ZFS + Comstar - you will save lot of money. However NetApp story is perhaps worth separate blog entry...
Friday, March 14, 2008
Interview question
During interviews one of the questions I ask is: "What is the basic difference between 32-bit and 64-bit application?". The obvious one is the available address space for an application. The other one is performance - more details here.
btw: it is really scary how few sys admins know the answer to the question, even if just the address space part...
btw: it is really scary how few sys admins know the answer to the question, even if just the address space part...
Tuesday, March 11, 2008
Sun's support
Remember my last post about Sun's support? Well, not about support itself but rather about tools Sun is providing to users which are crap. OSC is still saying it will be upgraded in March 2008 - it's just couple of weeks left so we will see.
Now I got several new boxes and I want to register them using sconadm - with or without support contract and it doesn't work, which means I can't use smpatch. Raising a ticket has not helped so far either.
Sun should definitely put more love to its patching infrastructure - I don't care if it old, we know Indiana is going to change it all, etc. But in a mean time they should assure that we (customers) have a working solution.
Going back to patchdiag tool... feeling disappointed at Sun.
Or rather I will finally try PCA - so far so good and it works!
Update: for whatever reason PCA make patching unusable - patch* commands started to core dump.
I got smpatach working, thanks to Sun's support (thanks Ben!). The problem is bug id: 6643363 which on systems with 64 or more CPUs makes smpatch to fail.
Now I got several new boxes and I want to register them using sconadm - with or without support contract and it doesn't work, which means I can't use smpatch. Raising a ticket has not helped so far either.
Sun should definitely put more love to its patching infrastructure - I don't care if it old, we know Indiana is going to change it all, etc. But in a mean time they should assure that we (customers) have a working solution.
Going back to patchdiag tool... feeling disappointed at Sun.
Or rather I will finally try PCA - so far so good and it works!
Update: for whatever reason PCA make patching unusable - patch* commands started to core dump.
I got smpatach working, thanks to Sun's support (thanks Ben!). The problem is bug id: 6643363 which on systems with 64 or more CPUs makes smpatch to fail.
Wednesday, March 05, 2008
16-core Intel System vs. Niagara-2
Some sysbench results for Niagara-2 (blade), v440 and 16-core Intel box.
For a CPU test Niagara-2 and 16-core Intel box deliver about the same performance with 32 threads, while Niagara is slower for less number of threads.
Sysbench memory test shows up-to 17GB/s for Niagara-2 blade, while only up-to 4GB/s for Intel 16-core box.
Sysbench threads test - again, more threads more performance for Niagara-2. Starting with 32 threads and more Niagara-2 box is the fastest one.
Quick conclusion is that Niagara-2 box can rival 4-CPU 4-core (16 cores in total) Intel boxes if your application can utilize all these cores. Sometimes it's much faster - especially if you need to access lots of memory. Of course it's just sysbench...
Niagara-2 - 1x 1.4GHz 8-core UltraSparc-T2, 32GB RAM
v440 - 4x 1GHz, 8GB RAM
16 core Intel - 4x 4-core Intel CPU (16 cores in total) 2.13GHz, 16GB RAM
For a CPU test Niagara-2 and 16-core Intel box deliver about the same performance with 32 threads, while Niagara is slower for less number of threads.
Sysbench memory test shows up-to 17GB/s for Niagara-2 blade, while only up-to 4GB/s for Intel 16-core box.
Sysbench threads test - again, more threads more performance for Niagara-2. Starting with 32 threads and more Niagara-2 box is the fastest one.
Quick conclusion is that Niagara-2 box can rival 4-CPU 4-core (16 cores in total) Intel boxes if your application can utilize all these cores. Sometimes it's much faster - especially if you need to access lots of memory. Of course it's just sysbench...
Niagara-2 - 1x 1.4GHz 8-core UltraSparc-T2, 32GB RAM
v440 - 4x 1GHz, 8GB RAM
16 core Intel - 4x 4-core Intel CPU (16 cores in total) 2.13GHz, 16GB RAM