Thursday, April 30, 2009

Solaris 10 5/09

Solaris 10 5/09 aka update 7 is out and available for download. Below are some new features from What's New I found interesting:

Support for Zone Cloning
If the source and the target zonepaths reside on ZFS and both are in the same pool, a snapshot of the source zonepath is taken and the zoneadm clone uses ZFS to clone the zone.

SunSSHWith OpenSSL PKCS#11 Engine Support
This feature enables the SunSSH server and client to use Solaris Cryptographic Framework through the OpenSSL PKCS#11 engine. SunSSH uses cryptographic framework for hardware crypto acceleration of symmetric crypto algorithms which is important to the data transfer speed. This feature is aimed at UltraSPARC® T2 processor platforms with n2cp(7D) crypto driver.

Several bug fixes and improvements. See the What's New for more details.

Solaris Power Aware Dispatcher and Deep C-State Support

Event driven CPU power management –On systems that supportDynamic Voltage and Frequency Scaling (DVFS) by Solaris, the kernel scheduler or dispatcher will schedule threads across the system's CPUs in a manner that coalesces load, and frees up other CPUs to be deeply power managed. CPU power state changes are triggered when the dispatcher recognizes that the utilization across a group of power manageable CPUs has changed in a significant way. This eliminates the need to periodically poll CPU utilizations across the system, and enables the system to save more power when CPUs are not used, while driving performance when CPUs are used. Event driven CPU power management is enabled by default on systems that supportDVFS. This feature can be disabled, or the legacy polling-based CPU power management can be used through the cpupm keyword in power.conf(4).

■ Support forDeep Idle CPU PowerManagement or deep C-state support on Intel Nehalem-based systems – The project also adds Solaris support forDeep C-states on Intel Nehalem-based systems. This support enables unused CPU resources to be dynamically placed in a state where they consume a fraction of the power consumed in their normal operating state. This feature also provides Solaris support for the power saving feature, as well as the policy implementation that decides when idle CPUs should request deep idle mode. This feature will be enabled by default where supported, and can be disabled through the cpu-deep-idle keyword in power.conf(4).

■ Observability for Intel's TurboMode feature – IntelNehalem-based systems have the ability to raise the operating frequency of a subset of the available cores when there is enough thermal headroom to do so. This ability temporarily boosts performance, but it is controlled by the hardware and transparent to software. Starting with the Solaris 10 5/09 release, a new kstat module observes when the system is entering the turbo mode and at which frequency it operates.

Wednesday, April 29, 2009

Some perspective to Suns Q3FY2009 numbers

Upgrading Open Storage 7410

Adam blogged about a new software release for the Open Storage. Because I'm testing 7410 model I went thru an upgrade this morning. First I downloaded the new image and I was impressed - I got 9-10MB/s and it looks like the bottleneck was my 100Mbs link to an office network - I never got such download rates from Sun before, excellent!
Then with just one mouse click I uploaded the new image onto the 7410 and after a short while it was listed as ready to be installed. So I did click on it and after about 25 minutes later it finished. Of course during the upgrade I could still use the appliance.

I really like the end-user experience - couple of mouse-clicks and you're done. That's the way it should be.

I did some testing with filebench before and after the upgrade and I'm really happy to share that I'm getting about 33% performance improvement for varmail workload with the new build. While your mileage may vary I think Sun should have highlighted performance improvements in the Release Notes.

What would really be useful for testing purposes is an ability to create a pool without L2ARC or SLOG devices - this would make life a little bit easier with testing when comparing configurations with and without SSDs. The default behaviour is excellent as it will pick up SSDs and propose most optimal use of them so end-user doesn't have to even understand how it works and how to configure them properly. Still I would like to have an option to not configure L2ARC or SLOG without having to physically pull them out.

Monday, April 20, 2009

Oracle Agrees to Acquire Sun Microsystems

This is a big surprise!

From Oracle's document on the acquisition:
• Protects and extends customers’ investment in Sun technologies
• Accelerate growth of Java as an open industry standard development platform
• Sustain Solaris as an industry standard OS for Oracle software
• Continue Open Storage and Systems focus and innovation
• Ensure continued innovation and investment in Java technology
• Optimize Solaris and Oracle for better performance, reliability, and manageability
• Protects massive customer investment in SPARC
• Open Storage built with industry standard servers and components

Sun's Official Announcement
Wall Street Journal

Monday, April 06, 2009

truss(1M) vs. dtrace(1M)

One of the many benefits of DTrace vs. truss is that dtrace should induce much smaller overhead for tracing applications especially for multi-threaded applications running on multi core/cpu servers. Lets put it to a quick test.

I quickly wrote a small C program which spawns N threads and each thread does stat("/tmp") X times. Then I measured how much time it takes to execute it for 1mln stat()'s in total while running with no tracing at all, running under truss and running under dtrace.

One two-core AMD CPU
# ptime ./threads-2 1 1000000

real 2.662809885
user 0.223471401
sys 2.435895135

# ptime ./threads-2 2 500000

real 1.649542016
user 0.226104849
sys 3.045784378

# ptime truss -t xstat -c ./threads-2 2 500000

syscall seconds calls errors
xstat 6.966 1000000
stat64 .000 3 1
-------- ------ ----
sys totals: 6.966 1000003 1
usr time: .776
elapsed: 18.520

real 18.533000528
user 5.677239771
sys 16.069020190

# dtrace -n 'syscall::xstat:entry{@=count();}' -c 'ptime ./threads-2 2 500000'
dtrace: description 'syscall::xstat:entry' matched 1 probe

real 1.888294217
user 0.225676973
sys 3.506004575
dtrace: pid 8526 has exited


truss made the program to execute about 11x longer while dtrace made program to execute for about 14% longer.

Niagara server:

# ptime ./threads-2 1 1000000

real 10.873
user 1.881
sys 8.992

# ptime ./threads-2 10 100000

real 1.467
user 1.962
sys 12.121

# ptime truss -t xstat -c ./threads-2 1 1000000

syscall seconds calls errors
stat 26.958 1000004 1
-------- ------ ----
sys totals: 26.958 1000004 1
usr time: 2.758
elapsed: 214.600

real 3:34.613
user 30.900
sys 2:28.182

# ptime truss -t xstat -c ./threads-2 10 100000

syscall seconds calls errors
stat 37.259 1000004 1
-------- ------ ----
sys totals: 37.259 1000004 1
usr time: 3.178
elapsed: 168.010

real 2:48.063
user 1:05.709
sys 3:35.813

# dtrace -n 'syscall::stat:entry{@=count();}' -c 'ptime ./threads-2 1 1000000'
dtrace: description 'syscall::stat:entry' matched 1 probe

real 14.028
user 1.957
sys 12.069
dtrace: pid 12920 has exited


# dtrace -n 'syscall::stat:entry{@=count();}' -c 'ptime ./threads-2 10 100000'
dtrace: description 'syscall::stat:entry' matched 1 probe

real 1.858
user 2.142
sys 15.632
dtrace: pid 11679 has exited


truss made the program to execute about 20x longer in the single thread case and 115x longer for the multi threaded one while dtrace added no more than 30% to the execution time regardless if the application was running with one or many executing threads. This shows that one has to be especially careful when using truss on a multi CPU/core system on a multi-threaded application. Notice that the performance difference between multi-threaded and single-threaded example for truss shows not that much difference comparing to execution times with no tracing at all which shows the ugly feature of truss - it serializes a multi-threaded application.

Of course the benchmark is the worst-case scenario and in real life you shouldn't get that much overhead from both tools. Still truss in some cases could introduce too much overhead on a production server while dtrace would still be perfectly acceptable allowing you to continue with your investigation.

btw: DTraceToolkit provides a script called dtruss - it's a tool similar to truss but it is using DTrace.

cat threads-2.c

#include <thread.h>
#include <stdlib.h>
#include <pthread.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>

void *thread_func(void *arg)
int *N=arg;
int i;
struct stat buf;

for (i=0; i<*N; i++)
stat("/tmp", &buf);


int main(int argc, char **argv)
int N, iter;
int i;
int rc;
pthread_t tid[255];

if (argc != 3)
printf("%s number_of_threads number_of_iterations_per_thread\n", argv[0]);

N = atoi(argv[1]);
iter = atoi(argv[2]);

for (i=0; i<N; i++)
if (rc = pthread_create(&tid[i], NULL, thread_func, &iter))
printf("Thread #%d creation failed [%d]\n", i, rc);

/* wait for all threads to complete */
for (i=0; i<N; i++)
pthread_join(tid[i], NULL);