Wednesday, February 22, 2006

T2000 beats E6500

We put T2000 (8x core, 1GHz) instead of E6500 with 12x US-II 400MHz into our production. We've got heavily multithreaded applications here. Server is doing quite a lot of small NFS transactions and some basic data processing. We didn't recompile applications for T1 - we used the same binaries as for US-II. Applications do use FPU rarely.

T2000 gives as about 5-7x the performance of E6500 in that environment.

Well, "quite" good I would say :)


ps. probably we can squeeze even more from T2000. Right now 'coz lack of time we stay with 5-7x.

Monday, February 20, 2006

T2000 real web performance


We did real production benchmarks using different servers. Servers were put into production behind load-balancers, then weights on load-balancers were changed so we got highest number of dynamic PHP requests per second. It must sustain that number of requests for some time and no drops or request queue were allowed. With static requests numbers for Opteron and T2000 were even better but we are mostly interested in dynamic pages.

T2000 is over 4x faster than IBM dual Xeon 2.8GHz!

Except x335 server which was running Linux all the other servers were running Solaris 10. Our web server is developed on Linux platform so it's best tuned on it. After fixing some minor problems web server was recompiled on Solaris 10 update1 (both SPARC and x86). No special tuning was done to application and basic tuning on Solaris 10 (increased backlog, application in FX class). Web server was running in Solaris Zones. On x4100 and T2000 servers two instances of web server were run due to application scalability problems. On smaller servers it wasn't needed as CPU was fully utilized anyway. Minimal I/O's were issued to disks (only logs). Putting application into FX class helped a little bit.

Perhaps putting application in a global zone, doing some tuning to Solaris and application itself plus tweaking compiler options could get as even better results.

For more details on T2000 visit CoolThreads servers web page.
You can also see SPECweb2005 results which do include T2000.

Servers configuration:

1. IBM x335, 2x Xeon 2,8GHz (single core, 2 cores total)
2. Sun x2100, 1x Opteron 175 2,2GHz (dual core, 2 cores total)
3. Sun x4100, 2x Opteron 280 2,4GHz (dual core, 4 cores total)
4. Sun T2000, 1x UltraSparc T1 1GHz (8 cores, 8 cores total)
5. Sun T200o 6x, 1x UltraSparc T1 1GHz (8 cores, - two cores (8 logical) were switched off using psradm(1M).

Saturday, February 18, 2006

Linux kernel boots on Niagara

From OSNews:
The Linux kernel has booted on top of the sun4v hypervisor on Sun's new Niagara processor (it's just the kernel, there was no root filesystem).

SX 2/06 is out

Please notice that Solaris Express 2/06 is more tested than Solaris Express Community Edition. SX 2/06 is based on build 33 of Open Solaris (SX CE is based on b33 right now). There are lot of changes this time - see Dan Price's What's New.

Friday, February 17, 2006

FMA support for Opteron

While looking at latest changes to Open Solaris I found interesting integrations in current changelog:
  • BUG/RFE:6359264 Provide FMA support for AMD64 processors
  • BUG/RFE:6348407 Enable EFI partitions and device ID supports for hotpluggable devices
  • BUG/RFE:6377034 setting physmem in /etc/system does not have desired effect on x86
First one is most interesting - I hope someone from Sun will write a blog entry about it with more details.
Quickly looking at some files I can see that memory scrubbing for x86 was added (or maybe it was before on x86?). It also looks like page retirement on x86 is implemented.


These should be in Solaris Express build 34.

Wednesday, February 15, 2006

Open Solaris on Xen

Open Solaris on Xen - finally something real to touch. If you are a developer you can join and help making this project real.

Xen on Open Solaris Openning Day Page:
Today, we're making the first source code snapshot of our OpenSolaris on Xen project available to the OpenSolaris developer community.

There are many bugs still in waiting, many puzzles to be solved, many things left to do. A true work in progress. Why are we doing this now? Because we don't believe the developer community only wants finished projects to test. We believe that some developers want to participate during the core development process, not after, and now this project opens its doors to that kind of participation.

We have a snapshot of our development tree for OpenSolaris on Xen, synced up with Nevada build 31. That code snapshot should be able to boot and run on all the hardware that build 31 can today, plus it can boot as a diskless unprivileged domain on Xen 3.0.

Running on Xen, OpenSolaris is reasonably stable, but it's still very much "pre-alpha" compared with our usual finished code quality. Installing and configuring a client is do-able, but not for the faint of heart. The current instructions can be found here.


Goals of the Project

This project aims to fully support OpenSolaris on Xen. Here's our top-level technology goals:

  • x86 and x64 paravirtualized guest kernels supporting dom0, domU, and driver domains
  • All reasonable combinations of Solaris, Linux, *BSD, and other paravirtualized OSes should interoperate.
  • Live migration, whole OS checkpoint/resume
  • MP limits and scale to match Xen's capabilities
  • Maximal portability to enable Solaris-on-Xen ports to other architectures.
  • Observability and debugging to enable performance work, RAS, system management, and sustaining.
  • Support fully virtualized guests [though this is mostly a Xen capability, rather than an OpenSolaris capability per se.]
  • Explore trusted platform capabilities.


SX b33 is out

It's been a while since last SX release due to some legal problems. Finally Solaris Express Community Edition based on build 33 is available (DVD or CD). Some new features since b30:


  • RealPlayer 10 included (x86/x64, SPARC)
  • Xorg 6.9.0 final version
  • Some ZFS fixes (improved CLI performance, whole root zone support, ... - number of performance fixes are in b34 and b35)
  • New SATA framework

See also Dan Price's What's New for SX 2/06 based on b31 (so in b33 there's even more).

UltraSPARC T1 in blades

Sun:
Sun Microsystems Announces Plans to Bring Breakthrough Efficiency of UltraSPARC T1 Processor to Upcoming Netra AdvancedTCA Blades and Carrier-Grade Rack Server Line.

UltraSPARC T1 specs under GPL

The Register:
Sun on Tuesday released the specifications for the UltraSPARC processor architecture 2005 and its HyperVisor API under the General Public License (GPL) 2.0. The architecture is available at www.opensparc.net.

Saturday, February 11, 2006

Improving Open Solaris IPv4 forwarding scalability

"Surya project aims to improve IPv4 forwarding path scalability. Improving forwarding scalability enables a Solaris machine to forward a higher number of packets per second to a greater number of destinations described in the forwarding table."

FreeBSD's implementation of Radix tree was chosen.

The Surya project is ready to code review. You can participate in discussion at networking-discuss.

Design doc is available at http://www.opensolaris.org/os/community/networking/surya-design.pdf.

Friday, February 03, 2006

Hidden in Plain Sight

Hidden in Plain Sight:
To hunt cow in its native habitat, the focus of observability infrastructure must make two profound shifts: from development to production, and from programs to systems. These shifts have several important implications. First, the shift from development to production implies that observability infrastructure must have zero disabled probe effect: The mere ability to observe the system must not make the delivered system any slower. This constraint allows only one real solution: Software must be optimized when it ships, and—when one wishes to observe it—the software must be dynamically modified. Further, the shift from programs to systems demands that the entire stack must be able to be dynamically instrumented in this way, from the depths of the operating system, through the system libraries, and into the vaulted heights of higher-level languages and environments. There must be no dependency on compile-time options, having source code, restarting components, etc.; it must be assumed that the first time a body of software is to be observed, that software is already running in production.

... and DTrace was born.

Wednesday, February 01, 2006

ZFS - recovering destroyed pools

If you created zfs pool with filesystems on it - it could happen that you destroyed that pool not really wanting it. Fortunatelly when you destroy a pool all that ZFS does is it marks that pool as 'destroyed' but no data or config on disks are erased. Making some really simple changes to ZFS utils (and to ZFS fs itself) you can easly recover previously destroyed pools with all filesystems and data on them.
When you do 'zfs import' it lists you available pools in a system skipping destroyed pools. After applying below changes you will also see destroyed pools which are marked '(DESTROYED)' in a state property. Using 'zpool import -f ' you can import again such pool.


bash-3.00# zpool create backup c2t0d0p0
bash-3.00# zfs create backup/d1
bash-3.00# zfs create backup/d2
bash-3.00# zfs list
NAME USED AVAIL REFER MOUNTPOINT
backup 376K 37.0G 100K /backup
backup/d1 98.5K 37.0G 98.5K /backup/d1
backup/d2 98.5K 37.0G 98.5K /backup/d2
bash-3.00#

bash-3.00# df -h -F zfs
Filesystem size used avail capacity Mounted on
backup 37G 99K 37G 1% /backup
backup/d1 37G 98K 37G 1% /backup/d1
backup/d2 37G 98K 37G 1% /backup/d2
bash-3.00#

bash-3.00# cp -rp /usr/kernel/ /backup/d1/
bash-3.00# cp -rp /usr/platform/ /backup/d2/
bash-3.00#
bash-3.00# df -h -F zfs
Filesystem size used avail capacity Mounted on
backup 37G 99K 37G 1% /backup
backup/d1 37G 3.4M 37G 1% /backup/d1
backup/d2 37G 849K 37G 1% /backup/d2
bash-3.00#


bash-3.00# zpool destroy backup
bash-3.00# zpool list
no pools available
bash-3.00#

bash-3.00# zpool import
pool: backup
id: 6753094033596765985
state: ONLINE (DESTROYED)
action: The pool can be imported using its name or numeric identifier. The
pool was destroyed, but can be imported using the '-f' flag.
config:

backup ONLINE
c2t0d0p0 ONLINE
bash-3.00#

bash-3.00# zpool import -f backup
bash-3.00#
bash-3.00# zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
backup 37.2G 4.45M 37.2G 0% ONLINE -
bash-3.00# zfs list
NAME USED AVAIL REFER MOUNTPOINT
backup 4.42M 37.0G 100K /backup
backup/d1 3.38M 37.0G 3.38M /backup/d1
backup/d2 849K 37.0G 849K /backup/d2
bash-3.00# df -h -F zfs
Filesystem size used avail capacity Mounted on
backup 37G 99K 37G 1% /backup
backup/d1 37G 3.4M 37G 1% /backup/d1
backup/d2 37G 849K 37G 1% /backup/d2
bash-3.00#


############################
#Changes to snv_29 source tree.#
############################

bash-3.00$ diff -u lib/libzfs/common/libzfs_import.c.orig lib/libzfs/common/libzfs_import.c
--- lib/libzfs/common/libzfs_import.c.orig Tue Jan 31 23:51:06 2006
+++ lib/libzfs/common/libzfs_import.c Tue Jan 31 23:51:58 2006
@@ -547,7 +547,7 @@
}

if (nvlist_lookup_uint64(config, ZPOOL_CONFIG_POOL_STATE,
- &state) != 0 || state > POOL_STATE_EXPORTED) {
+ &state) != 0 || state > POOL_STATE_DESTROYED) {
nvlist_free(config);
continue;
}
bash-3.00$


bash-3.00$ diff -u uts/common/fs/zfs/vdev.c.orig uts/common/fs/zfs/vdev.c
--- uts/common/fs/zfs/vdev.c.orig Sat Dec 17 06:15:08 2005
+++ uts/common/fs/zfs/vdev.c Wed Feb 1 01:24:58 2006
@@ -1147,7 +1147,7 @@
}

if (state != POOL_STATE_ACTIVE &&
- (!import || state != POOL_STATE_EXPORTED)) {
+ (!import || (state != POOL_STATE_EXPORTED && state != POOL_STATE_DESTROYED))) {
dprintf("pool state not active (%llu)\n", state);
nvlist_free(label);
return (EBADF);
bash-3.00$


bash-3.00$ diff -u cmd/zpool/zpool_main.c.orig cmd/zpool/zpool_main.c
--- cmd/zpool/zpool_main.c.orig Sat Dec 17 06:16:00 2005
+++ cmd/zpool/zpool_main.c Wed Feb 1 01:52:58 2006
@@ -799,7 +799,10 @@

(void) printf(" pool: %s\n", name);
(void) printf(" id: %llu\n", guid);
- (void) printf(" state: %s\n", health);
+ (void) printf(" state: %s", health);
+ if (pool_state == POOL_STATE_DESTROYED)
+ (void) printf(" (DESTROYED)");
+ (void) printf("\n");

switch (reason) {
case ZPOOL_STATUS_MISSING_DEV_R:
@@ -832,7 +835,10 @@
if (strcmp(health, gettext("ONLINE")) == 0) {
(void) printf(gettext("action: The pool can be imported"
" using its name or numeric identifier."));
- if (pool_state != POOL_STATE_EXPORTED)
+ if (pool_state == POOL_STATE_DESTROYED)
+ (void) printf(gettext(" The\n\tpool was destroyed, "
+ "but can be imported using the '-f' flag.\n"));
+ else if (pool_state != POOL_STATE_EXPORTED)
(void) printf(gettext(" The\n\tpool may be active on "
"on another system, but can be imported using\n\t"
"the '-f' flag.\n"));
@@ -842,7 +848,10 @@
(void) printf(gettext("action: The pool can be imported "
"despite missing or damaged devices. The\n\tfault "
"tolerance of the pool may be compromised if imported."));
- if (pool_state != POOL_STATE_EXPORTED)
+ if (pool_state == POOL_STATE_DESTROYED)
+ (void) printf(gettext(" The\n\tpool was destroyed, "
+ "but can be imported using the '-f' flag.\n"));
+ else if (pool_state != POOL_STATE_EXPORTED)
(void) printf(gettext(" The\n\tpool may be active on "
"on another system, but can be imported using\n\t"
"the '-f' flag.\n"));
bash-3.00$