Friday, December 23, 2005

Thursday, December 22, 2005

Solaris 10 Update 1

Solaris 10 Update 1 is available. There're many changes comparing to S10 03/05 like:

  • New Boot Architecture on x86/x64 (GRUB, faster system startups, easier network installs,...)
  • iSCSI client support
  • GLDv3 (VLANs, Link Aggregation, improved performance - man dladm)
  • Performance improvements (MPO, Large Pages, ...)
  • fcinfo command
  • new drivers
  • new ACPI (+virtual keyboard&mouse)
Check What's New.

ps. of course it is available for free

Wednesday, December 21, 2005

Writing Zeros to ZFS files

I did look into some of the ZFS code and found nice little feature. In a function zio_compress_data there is:
75 /*

76 * If the data is all zeroes, we don't even need to allocate

77 * a block for it. We indicate this by setting *destsizep = 0.

78 */

79 allzero = 1;

80 word = src;

81 word_end = (uint64_t *)(uintptr_t)((uintptr_t)word + srcsize);

82 while (word < word_end) {

83 if (*word++ != 0) {

84 allzero = 0;

85 break;

86 }

87 }

88 if (allzero) {

89 *destp = NULL;

90 *destsizep = 0;

91 *destbufsizep = 0;

92 return (1);

93 }



So if you set compression=on (or set any other available compression method - right now there's only lzjb available) if a block contains all zeros then actual compression method will not be called (less CPU consumed) and no IOs for these zeros will be generated. I did a small test - created two filesystems on a RAID-Z pool - one with compression set to on and other to off. Then I did run 'dd if=/dev/zero of=/test/fs1/q1 bs=1024k count=1024' which translates to writing 1GB of data (only zeros). If a filesystem is with compression set to off then using iostat I can see lot of IOs to underlying disks. However if compression is set to on then only about 600KB is written to underlying disks (not to mention that whole operation took about 2s on a 1.5GHz USIIIi). Well, this is really clever small thing (not the most clever in ZFS of course).

Now I wonder if it would be beneficial if the same check would be done even if there's no compression? (basically moving this code up - so regardless if compression is on or off if whole block is 0s then do now write them).

LDAP hints

If you are interested in Directory Server and some hints, tuning information then you should look at this blog. You can find them some info why using ZFS with LDAP is a good idea.


btw: if you haven't known already - Directory Server is free to use with no limits as part of Solaris Enterprise System.

Wednesday, December 14, 2005

Linux Applications on Solaris

Want to run Linux applications (and whole CentOS or RedHat) in a Zone on a Solaris? If you do please check its official page. Well, even DTrace works with Linux applications.

I did test JANUS (it was called Project JANUS) some time ago and basically it just worked - mozilla, yum, other applications. I did try with CentOS.

From the announcement:

BrandZ is a technology that extends the zones infrastructure to allow for the creation of "non-native" zones. Non-native is a deliberately ambiguous term, as we are trying not to let our preconceived notions of the technology limit its usefulness.

The first brand we are developing under the BrandZ umbrella is 'lx', a brand that supports the execution of 32-bit x86 Linux applications on a x86/x64 machine running Solaris Nevada. Specifically, the lx brand allows the user to install a complete CentOS or Red Hat Enterprise Linux 3.x distribution in a zone. When the zone is booted it will still be running the Solaris kernel under the hood, but the userspace environment will include nothing but Linux software from init(1M) on up.

In theory the BrandZ infrastructure could also be used to create other types of zones. One such example would be a GNU Solaris brand, which runs Solaris binaries but has the standard utilities replaced by their GNU equivalents. Other possible uses would be the creation of zones for running FreeBSD or Darwin x86 environments. Because this technology is being made available via the OpenSolaris community, you as a community member will be able to help create these or other brands if so inspired.



If you are interested in a background information about BrandZ read it.

Monday, December 12, 2005

T1000/T2000 - Benchmarks Records

Systemnew reports:

* Sun Fire T2000 server using Sun JavaTM System Web Server 6.1 SP5 achieved a world record SPECweb2005 and demonstrated 1.7 times the performance advantage over the four-way IBM eServer p5 550 with 4.3 times higher performance per watt while occupying half the space. In addition, the Sun Fire T2000 server delivers more than three times the performance of the two-way 3.8 GHz Xeon-based IBM eServer xSeries x346 while delivering over 4.1 times higher performance per watt.

* Sun Fire T2000 server using BEA Weblogic Server was 1.3 times faster than the performance of a four-way 1.6 GHz Itanium two-based HP rx4600 server on the dual node SPECjAppServer2004. The Sun Fire T2000 achieved the overall performance world record on all two node results.

* Sun Fire T2000 server using Sun JavaTM System Application Server 8.2 Performance Edition achieved world record price/performance on the application tier beating a four-way 1.6GHz Itanium two-based HP rx4600 server on the dual node SPECjAppServer2004.

* Sun Fire T2000 server, equipped with the UltraSPARC T1 processor achieved overall price/performance leadership on the Lotus R6iNotes Domino 7 benchmark. On IBM's own benchmark the Sun Fire T2000 beats the price/performance of the POWER5+ based IBM p5 p550Q by 27 percent. In addition, Sun has more than twice the price/performance advantage and a nine percent performance advantage over the POWER5 based IBM p5 570 server.

* In a demonstration addressing portal server workload, the Sun Fire T2000 server beats the performance of the 2GHz Xeon-based Dell 6650 server by running on the new Sun JavaTM System Portal Server 7, with 6 times more logins per second while providing 33 percent capacity headroom on Sun Fire T2000 Server versus zero percent on Dell. This new release of the Java System Portal Server allows users to easily create interactive communities of users and services, building "community" portals populated with collaborative content including RSS feeds, blogs and wikis. Additional information on the Java System Portal Server 7 is expected from Sun in the near future.

* Sun Fire T1000 beats the performance of the Dell SC1425 by over two times, while consuming half the power. In comparison to the IBM p520 two-way Power 5+ server, the Sun Fire T1000 server delivered 1.5 times higher performance in four times less space and at 3.7 times performance per watt.

* Sun Fire T2000 server beats the performance of the 1.9 GHz POWER5+ based IBM p5 550 four-way on the SPECjbb2005 and was more than 1.6 times faster than the 2.8 GHz dual-core IBM x346. HP and Fujitsu have not published results on this new benchmark that demonstrates JavaTM server performance.

Friday, December 09, 2005

Measuring NFS traffic with DTrace

Below you can find simple script written in DTrace which will show you in a given time intervals what is a distribution of data written and read to/from NFS mounted filesystems in Mb per second (average in the interval). It will be updated continuously every given amount of time.

Script expects four arguments: low high step interval. First two of them are low and high of a range in Mbs you want your output, step describes resolution of the output and interval is a resolution of a measurement (time interval from which Mbs is calculated). In an example below I wanted to see output in a range 50-150Mbs with 5Mbs step calculated for intervals of 10s and see results every 10s. As you can see most of the time I have on average 90-120Mbs generated to/from NFS servers.
Note that this is only an estimate.


bash-3.00# cat io-nfs-throughput.d
#!/usr/sbin/dtrace -qs

BEGIN
{
bsum = 0;
}

io:nfs::start
{
bsum += args[0]->b_bcount;
}

tick-$4
{
@a["Mbs"] = lquantize(bsum*8/($4*1024*1024),$1,$2,$3);
printa(@a);
bsum=0;
}

bash-3.00# ./io-nfs-throughput.d 50 150 5 10s
[...] after some time
Mbs
value ------------- Distribution ------------- count
55 | 0
60 | 1
65 | 2
70 | 2
75 |@ 3
80 |@ 8
85 |@@@ 17
90 |@@@@@ 32
95 |@@@ 16
100 |@@@@ 23
105 |@@@@ 22
110 |@@@@ 21
115 |@@@@ 21
120 |@@@ 19
125 |@@@ 17
130 |@@ 14
135 |@@ 11
140 |@ 7
145 | 0
>= 150 |@ 4

Simulating Data Center

This is cool simulator of data center :))) You can easily compare T1000/T2000 with IBM's POWER5 servers (or other server if you define them your self). Not only performance but also power usage, space usage and costs. Have a great fun!

Thursday, December 08, 2005

snv_b28 available

Solaris Express snv_b28 is available both CD ISOs and DVD ISOs. Some interesting changes (comparing to b27) from putback logs:
  • 64-bit glm driver on x86
  • Interactive volume creation and deletion for raidctl
  • several ZFS bug-fixes and performance improvements
  • Kernel-level SSL proxy
I haven't blogged it yet (I was on vacations) but starting from build 27 ZFS is available to public! Also sources of ZFS are available at opensolaris.org. So if you want to play with ZFS (of course you want!) go and download SX (of course it's also available in latest bits - b28). Other OpenSolaris based distributions seems to already include ZFS too (http://www.gnusolaris.org). If you have any questions about ZFS go and subscribe to ZFS mailing list.

Network Computing O5Q4

During yesterday's Network Computing Sun announced new servers: T1000 and T2000. Both with Niagara processors (16-32 threads per CPU). Below some Copy&Paste from announcements.

http://www.sun.com/smi/Press/sunflash/2005-12/sunflash.20051206.3.html
Continuing to build on their longstanding collaboration, today Sun and Oracle are also offering customers a special opportunity to try Oracle on the Sun Fire T1000 and T2000 systems. As part of a special promotion, customers using Oracle products with CPU-based licenses on Sun Fire T1000 and T2000 systems will be able to count cores as .25 percent of a processor versus alternative methods.
http://www.sun.com/smi/Press/sunflash/2005-12/sunflash.20051206.4.html
Today, Sun also announced plans to publish specifications for the UltraSPARC-based chip, including the source of the design expressed in Verilog, a verification suite and simulation models, instruction set architecture specification (UltraSPARC Architecture 2005) and a Solaris OS port. The goal is to enable community members to build on proven technology at a markedly lower cost and to innovate freely. The source code will be released under an Open Source Initiative (OSI)-approved open source license.
OpenSparc Project Page


http://www.sun.com/smi/Press/sunflash/2005-12/sunflash.20051206.2.html

Sun Fire T2000 server using Sun's Web Server 6.1 SP5 achieved a world record SPECweb2005 and demonstrated a 1.7x performance advantage over the 4-way IBM eServer p5 550 with 4.3x higher performance per watt while occupying half the space. In addition, the Sun Fire T2000 delivers more than 3x the performance of the 2-way 3.8GHz Xeon-based IBM eServer xSeries x346 while delivering 4.1X higher performance per watt.(1)

Sun Fire T2000 server using BEA Weblogic Server was 1.3X faster than the performance of a 4-way 1.6GHz Itanium2-based HP rx4600 server on the dual node SPECjAppServer2004. The Sun Fire T2000 achieved the overall performance world record on all two node results.(2)

Sun Fire T2000 server using Sun Java System Application Server 8.2 Performance Edition (AS 8.2 PE) achieved world record price/performance on the application tier beating a 4-way 1.6GHz Itanium2-based HP rx4600 server on the dual node SPECjAppServer2004. Sun Java System AS 8.2PE is free for development and deployment.

Sun Fire T2000 server, equipped with the UltraSPARC T1 processor achieved overall price/performance leadership on the Lotus R6iNotes Domino 7 benchmark. On IBM's own benchmark the Sun Fire T2000 beats the price/performance of the POWER5+ based IBM p5 p550Q by 27 percent. In addition, Sun has more than twice the price/performance advantage and a nine percent performance advantage over the POWER5 based IBM p5 570 server. (3)

In a demonstration addressing the all-important and ubiquitous portal server workload, the Sun Fire T2000 server beats the performance of the 2GHz Xeon-based Dell 6650 server by running on the new Sun Java System Portal Server 7, with 6x more logins per second while providing 33 percent capacity headroom on Sun Fire T2000 Server versus zero percent on Dell. This new release of the Sun Java System Portal Server allows users to easily create interactive communities of users and services, building "community" portals populated with collaborative content including RSS feeds, Blogs and Wikis. Additional information on the Sun Java System Portal Server 7 will be communicated in the coming days.

Sun Fire T1000 beats the performance of the Dell SC1425 by over 2x, while consuming half the power. In comparison to the IBM p520 2-way Power 5+ server, the Sun Fire T1000 server delivered 1.5x higher performance in 4x less space and at 3.7x superior performance per watt.

Sun Fire T2000 server beats the performance of the 1.9GHz POWER5+ based IBM p5 550 4-way on the SPECjbb2005 and was more than 1.6x faster than the 2.8GHz dual-core IBM x346. (4) Surprisingly, HP and Fujitsu have not published results on this important new benchmark that demonstrates Java server performance.

Tuesday, December 06, 2005

T2000

SystemNews is reporting new T2000 SPARC server from Sun based on Niagara chip UltraSPARC T1. You can find more info also on Sun pages. Looks like this server is going to be presented on tomorrow NC event. Chassis looks similar to x4200 - probably the same. From the documentation:

  • CPU - 4, 6, 8 cores (each with 4 threads, so 16, 24 or 32 virtual CPUs are seen in a system) UltraSPARC T1 (Niagara)
  • RAM - 16 slots (with 2GB DDR2 - 32GB)
  • Ethernet - 4x 100/1000 on-board
  • DISKS - 1-4 2,5" SFF SAS 73GB
  • 1x DVD
  • I/O - 3x PCI-E, 2x PCI-X (64bit 133MHz)
  • Redundant power and cooling
  • OBP/ALOM
  • Hardware-assisted cryptography (RSA & DSA on-chip)
  • 350W nominal power consumption (400W MAX)

Now I wonder about price of the server and what actual performance in web serving it could achieve.

btw: looks like smaller version named T1000 is going to presented too.

update: another article on Niagara with some performance benchmarks.
From the article:

• On the SPECjbb2005 test of Java server software, the T2000 scored 53,378 business operations per second compared with 61,789 for an IBM p5-550 with two dual-core Power5 chips and 24,208 for a Dell PowerEdge SC1425 with dual single-core Xeon processors.

• On the SPECweb2005 test of Web server performance, the T2000 socred 14,001, compared with 7,881 for an IBM p5-550 with two dual-core Power5 processors, 4,850 for a Dell PowerEdge 2850 with two dual-core Xeon processors, and 4,348 for an IBM x345 with dual single-core Xeon processors.

• On the NotesBench test of Lotus Notes performance, a T2000 accommodated 19,000 users at $4.35 per user and got a NotesMark score of 16,061. In comparison, an eight-processor IBM p5-570 had 17,400 users, a cost of $10.19 per user, and a NotesMark score of 14,740. But the average response time of the IBM system was 270 microseconds compared with the slower 400 microseconds for the T2000, demonstrating the relatively slow single-thread performance of the Sun system.


Looks like WEB performance is really good.

Monday, December 05, 2005

NexentaOS


I've just installed NexentaOS on my laptop - this is GNU distribution based on Debian and Open Solaris. I must admit that it works really good - automatically detects my network, X, etc. - quite impressive. Looks like DTrace, ZFS, SMF works just out of the box. Tools for packet administration also work - I tried to upgrade some of packages and it just worked! Good job. I think this could become great distribution for all people who prefer GNU utilities but want Solaris stability and features like DTrace, ZFS, etc. It's definitelly worth to look at it. Debian users will probably feel quite comfortable with this distribution. Keep in mind that this is alpha release (still I'm impressed how much has been achieved in such a short time - Open Solaris went public in June this year).

Solaris Networking internals

Part I and Part II of Solaris Networking - the Magic Revealed is going to be a part of new (upcoming) Solaris Internals book.

Friday, December 02, 2005

Project Red October

Project Red October - Sun is giving JES, N1 and developer tools for free! And it looks like all of the software will be open sourced. It means that for example Sun Cluster or Messaging Server, C/C++ compilers, etc. are now free on all available platforms.

From official announcement:
MENLO PARK, CALIF; November 30, 2005 - Sun Microsystems, Inc. (NASDAQ: SUNW) today announced two landmark moves in the battle to create the software platform of choice for the next-generation of the Internet. First, having seen tremendous momentum with the Solaris Operating System (OS) as free and open source software, Sun is making the Java Enterprise System, Sun N1 Management software, and Sun developer tools available at no cost for both development and deployment and further, is reaffirming its commitment to open source this software. Second, Sun is announcing that it is integrating all of this software along with the Solaris OS into the Solaris Enterprise System, the only comprehensive and open infrastructure software platform available today.
All of Sun's server software is delivered in one pack and is called Solaris Enterprise System.

Monday, November 28, 2005

Oracle development on Solaris 10

SANTA CLARA, CALIF and REDWOOD SHORES, CALIF. 15-NOV-2005 Oracle and Sun Microsystems, Inc. (NASDAQ: SUNW) today announced that Oracle has chosen the Solaris (TM) 10 Operating System (OS), Sun's multi-platform, open source OS, as its preferred development and deployment platform for most x64 architectures, including x64 (x86, 64-bit) AMD Opteron and Intel Xeon processor-based systems and Sun's UltraSPARC(R)-based systems. The Solaris 10 OS will be used throughout Oracle's development organization. Oracle also plans to release and ship 64-bit versions of all Oracle products on the Solaris OS prior to or simultaneous with the release of its products on other operating systems.

With the selection of the Solaris 10 OS, as well as full access to source code and support for more than 440 x86/x64 systems, Oracle will have access to key features including Dynamic Tracing (DTrace), Solaris Containers and TCP/IP performance enhancements. Today's announcement also helps assure customers that Oracle technologies and applications will take full advantage of the advanced features of the Solaris OS.

"Oracle has long viewed the Solaris OS as an important foundation for Oracle applications, but this announcement takes that one step further. With Solaris 10 Sun has delivered an open source, cross-platform OS. And, it's impossible to ignore the significant market opportunity created by the incredible growth of Solaris 10 along with Sun's industry-standard x64 and UltraSPARC-based systems. Solaris was the clear choice for our development platform," said Larry Ellison, CEO, Oracle.

In less than one year, Sun has distributed more than 3 million Solaris OS licenses - free of charge - and the Solaris OS currently supports more than 539 platforms, providing customers with the ability to take advantage of Solaris 10 on the broadest choice of hardware in the industry.

"For more than 20 years, Sun and Oracle have worked together to deliver unparalleled value through joint OS and application tuning and optimization," said Scott McNealy, chairman and CEO, Sun Microsystems, Inc. "Our working together is a major opportunity to continue growing our joint customer base in the x64 and UltraSPARC markets and helps assure customers that our collaboration provides them virtually seamless integration between Sun and Oracle technologies."
Official annoucements by Oracle and Sun.

Back from vacations

Vacation is over - back to blogging and work :)

Friday, October 28, 2005

Tuning MySQL server

We've got many instances of MySQL on the same server and we have run some of them on Solaris x64 for last few days. Well, it's Solaris - I couldn't resist and did look a little bit around on this server. Below you can find some examples what you can observer using DTrace and how easy it is. These examples aren't exactly the ones I did use on production - these are similar but changed a little bit for simplification. Anyway they're still useful. This time let's try IO Provider.

I noticed using iostat that /var is being written to a lot. Probably /var/tmp or some logs - let's check it!

bash-3.00# dtrace -n io:::start'/args[2]->fi_mount == "/var"/{trace(args[2]->fi_pathname);}'
dtrace: description 'io:::start' matched 6 probes
CPU ID FUNCTION:NAME
0 94 bdev_strategy:start /var/tmp/#sql_e91_4.MYD
0 94 bdev_strategy:start /var/tmp/#sql_e91_4.MYD
0 94 bdev_strategy:start /var/tmp/#sql_e91_4.MYD
0 94 bdev_strategy:start /var/tmp/#sql_e91_2.MYD
[...]
^C

Ok, so it's /var/tmp after all. Probably it's good to make /var/tmp a tmpfs filesystem.

Now let's see which instance of MySQL is doing most of the IOs.

bash-3.00# dtrace -n io:::start'{@a[execname]=count();}'
dtrace: description 'io:::start' matched 6 probes
^C

sched 6
mysqld2 7
fsflush 225
mysqld1 244
mysqld3 2993
bash-3.00#



How many bytes are actually transfered during IOs by each instance?

bash-3.00# dtrace -n io:::start'{@a[execname]=sum(args[0]->b_bcount);}'
dtrace: description 'io:::start' matched 6 probes
^C

sched 286208
fsflush 2939904
mysqld2 5169152
mysqld1 21300224
mysqld3 51502592
bash-3.00#


Maybe some IO's of given instance are mostly cached by filesystem so let's see how many bytes are transferred only when physical IO is done.

bash-3.00# dtrace -n io:::start'/args[0]->b_flags & B_PHYS/{@a[execname]=sum(args[0]->b_bcount);}'
dtrace: description 'io:::start' matched 6 probes
^C

sched 165888
mysqld2 397312
fsflush 1048576
mysqld1 6002176
mysqld3 85908480
bash-3.00#

Let's say we want to know which file is being mostly accessed.

bash-3.00# dtrace -n io:::start'{@a[args[2]->fi_pathname]=sum(args[0]->b_bcount);}'
dtrace: description 'io:::start' matched 6 probes
^C

/opt/mysql1/master.info 3072
/opt/mysql2/kawiarenki_dir/MiniCzaty.MYD 3072
/opt/mysql3/ib_logfile0 12288
/opt/mysql2/opteron-slow.log 12288
/opt/mysql2/dzieci2/DAYRATE.MYD 12288
/opt/mysql2/kawiarenki_dir/Pokoje.TMD 12288
/opt/mysql2/hosting/hp_pages.TMD 12288
/opt/mysql1/ib_logfile0 12288
/opt/mysql2/opteron-bin.012 24576
/opt/mysql1/opteron-bin.006 24576
/opt/mysql1/opteron-relay-bin.008 86016
/opt/mysql1/ibdata6 147456
/opt/mysql3/opteron-bin.017 221184
/opt/mysql1/ibdata2 393216
/opt/mysql2/users/users.TMM 643072
/opt/mysql1/ibdata3 753664
/opt/mysql1/ib_logfile1 819200
/opt/mysql1/ibdata5 2457600
/var/tmp/#sql_49fc_0.MYD 3784704
/opt/mysql3/ibdata1 3833856
/opt/mysql1/ibdata4 4030464
/opt/mysql3/ibdata3 4317184
/var/tmp/#sql_49fc_1.MYD 5382144
/opt/mysql2/users/users.TMD 5455872
/opt/mysql3/ibdata5 5603328
/opt/mysql3/ib_logfile2 5664768
/opt/mysql1/ibdata1 8970240
/opt/mysql3/ibdata4 9183232
/opt/mysql3/ibdata2 20111360
bash-3.00#


What you can get freom all of this? Well, first it's probably worth either linking /var/tmp to /tmp or make /var/tmp tmpfs filesystem. You know exactly which instance of MySQL is making most use of disks - so if you have a problem with storage performance you know which instance move to other server or to give it separate storage (probably faster). Then if some subset of files is beeing accessed much more than the rest you can spread these files to separate storage.

And so on... all of this on a production box in a safe manner. All you need is just an imagination of what to ask and DTrace gives you an answer :) (but you've got to know how to ask :))))

Thursday, October 27, 2005

IBM supports Solaris

Looks like IBM is going to officially support Solaris on its Bladecenter systems.

I'm pleased to announce we've signed up our first tier 1 systems vendor as a Solaris supporter: it's IBM, and their decision to provide comprehensive support for Solaris on Bladecenter definitely puts them ahead of the other blade vendors in offering a truly OS neutral product.
As a result of our agreement, IBM will be adding value to BladeCenter, optimizing Solaris for IBM hardware offerings, adding volume to the Solaris community, and proving that the best choice for customers is, in fact, real choice. It sends a clear message to IBM accounts that Solaris is now a top tier option for BladeCenter deployments.

Read more

UPDATE: According to article on The Register -"IBM has agreed to sell Sun's operating system with its BladeCenter servers in "the coming months," according to an IBM spokesman."

3mln Solaris licenses!

It hasn't even been a year since Solaris 10 was released and yet yesterday Solaris 10 hits 3,000,000 registered licenses milestone. Official annoucement here.

Saturday, October 22, 2005

Winter is coming...

Winter is coming and if you do not like cold nights then go and by Intel CPUs. However if you've got enough heat in your datacenter then looks like AMD is a way to go. Some benchmarks comparing new Intel dual core CPUs to AMD are here. Power consumption benchmark is on page 3.

btw: and despite that Opterons produces much less heat they are faster too! :)))

Wednesday, October 19, 2005

DTrace - virtualized consumer

One of the really nice features of DTrace is its virtualized consumer. It means that you can run different actions for the same probes at the same time on the same system. This is really useful when many sys admins or developers are looking at the system and application. Thanks to this feature they don't have to care that someone else is doing his work at the same time. This is specially useful for scripts which are run for many hours as you can still do the other work with DTrace.

SX 10/05 is out

New Solaris Express release is out - 10/05 is based on Nevada 23. There's also available SX Community Edition build 24.

And as usual thanks to Dan Price you can read his What's New.

btw: if you put SX on a computer with Windows or diagnostic partition it should detect it automatically and put appropriate GRUB menu entries - for me it just works :) Small but nice.

Tuesday, October 18, 2005

Interview with Andy Bechtolsheim

Interview with Andy Bechtolsheim - the creator of Galaxy servers.

Monday, October 17, 2005

OpenWeekend in Prague

I attended as a speaker OpenWekend conference in Prague this weekend. I was giving a presentation about DTrace and there were other people talking about OpenSolaris. During the conference first Czech Open Solaris User Group (CZOSUG) meeting happened which I was lucky to attend to. Big thanks to Katarina Machalkova for some translation - without her help I wouldn't understand a word (and they say polish is similar to czech - no, it isn't). The conference was well organized and I think there were over 100 attendees.

I wanted during the presentation to give some live demo of DTrace but unfortunately there was a problem with getting video from Xorg out and there was no time to figure it out. Anyway I gave a small presentation using my laptop to few folks after my presentation - I think people were interested in DTrace.

After the official conference speakers had a meeting and I was talking with other fellows about Linux, Open Solaris, DTrace - this was quite hot conversation for many hours :)

This is me giving presentation :)


Here you can find more pictures.

Wednesday, October 12, 2005

DTrace updates with new kernel

Recently new kernel patch level became available for Solaris 10 (Sparc). If you do use DTrace I belive you will be interested in this patch. The patch number is 118222-19 and is available for free (no support contract is needed). What's New in -19 revision? - lot of DTrace as you can see - some problems/RFEs are mine :)

update: patch 118844-19 is available for free, this is x86/x64 version of the kernel

4923208 Sb150 systems hang -WARNING: ata_controller - Can not reset Primary channel
5029967 dtrace should provide an option to show probe argument types, stability
5108961 vestigial kadb turds left in dtrace
6213962 dtrace -G doesn't work on sparcv8+ object files
6214615 Conversion of bufinfo_t's b_resid is not defined in the io.d translator
6217821 dtrace cmd fails assertion in dt_proc_lock()
6218854 USDT and the jstack() action don't work on processes on a zone
6219195 lockstat under Solaris 10 unacceptably degrades performance
6220843 dt_pid should look for PR_OBJ_LDSO rather than ld.so
6221490 plockstat(1M) needs options to set aggsize and aggrate
6221495 plockstat(1M) needs a verbose option to report when tracing has started
6221496 plockstat(1M) should have an option to exit after a time limit expires
6221498 plockstat(1M) should have an option to limit number of entries in output
6223379 lockstat fails to report one stack frame
6223603 the pid provider is willing to instrument things it shouldn't
6225650 D compiler can't resolve past implicit forward declarations
6226263 usdt probes will fail to instantiate if pid probes are specified first
6226302 must allow enablings to be retained and rematched
6226320 must allow enablings to be duplicated after tracing is enabled
6226345 dtrace_consume() can call record callbacks with incorrect argument
6228044 the pid provider can miss some function returns
6229159 dtrace should be able to trace dynamically loaded objects
6230315 pid123::ioctl:return finds the wrong instruction
6231207 libdtrace is not able to properly resolve some probe argument types
6232748 pid provider can miss recursive function returns
6234004 libctf should support interfaces for client data in ctf_file_t's
6234033 ctf_type_name() should return NULL if input fp is NULL
6234037 D strchr2esc() incorrectly includes sign extension bit
6234063 D compiler support for USDT translators (part 1)
6234072 pid provider mishandles recursive returns
6234449 ctf_lookup_by_name() fails when typedef is a qualifier substring
6236617 D compiler support for restrict keyword
6236726 ustack() at pid provider return probes can be confusing
6238322 fasttrap::fasttrap:fasttrap args are broken on amd64
6239626 helpers aren't backward compatible with S10
6250382 ctf_type_name() wrong when type order conflicts with lexical precedence
6250386 ctf should not require callers to hardcode type name buffer size
6253027 bufpolicy of "fill" or "ring" causes dtrace(1M) to consume 100% of CPU
6253028 dtrace_probe()/dtrace_state_go() race can induce D data corruption
6253030 adding an action to an ECB takes quadratic time
6253031 dtrace_consume() can (still) call record callbacks with bad argument
6253033 aggregations should be sorted on key as well as value
6254258 dtrace doesn't pick up usdt probes in dlopened objects
6254741 usdt generation can do the wrong thing with tail calls
6258738 fbt refuses to instrument functions starting with branches
6264469 fbt is confused by jump tables in code
6264473 fbt is confused by the return instruction
6265086 DTrace has anemic string handling facilities
6265087 clause-local D variables can only be scalars
6265088 storing NULL to by-reference static variables induces an error
6265090 need mechanism to redirect stdout from within a D script
6265094 copyinstr() should take optional limit parameter
6267670 dt_type_pointer() should report reason for failure
6267671 ctf_add_type() fails when definition added after forward declaration
6267680 D compiler should not permit void parameters to have names
6267682 D compiler is not checking array redeclarations properly
6267693 D compiler support for inline associative array references
6267695 DTrace should provide a fds[] array for file descriptor information
6275414 unary operator * doesn't work properly when applied to args[] elements
6282291 D compiler core dumps in dt_node_dynamic() for inline parameter
6303053 pid provider panic under low memory conditions
6303188 some dtrace scripts with speculations fail to load
6304654 predicates containing args[] references may be incorrectly cached
6209411 truss -u can make a target hang up
6210881 When there is memory pressure dnlc not setting dca_dircache to DC_RET_LOW_MEM
6213074 kphysm_add_memory_dynamic calls dump_resize late
6265027 rpc destroys a CV with waiters
6233615 Fatal System Bus Error during suspend/resume cycle for DR
6235086 divide by zero panic in lgrp_move_thread() during network boot on v40z
6249712 unconfigure memory hangs lgrp_mem_init()
6232864 panic in lgrp_mem_choose() during dr testing
6251625 missed change to prototype in lgrpplat causes build failures for OEM customers
6244519 dead code to suspend kernel threads on OS quiesce should be removed
6271688 chdir'ing in /proc blows up.
6271759 pwdx lets you see other users' processes working directories
6272865 race condition between SIGKILL and /proc PCAGENT
6240456 Need topo enumeration for PCI Express
6288246 amd64 kernel needs to detect AMD Opteron erratum 131
6290459 SIGVTALRM signal delivery delayed under Solaris 10 due to t_astflag not being set
6302751 add ptl1_gregs[MAXGL + 1], change PTL1_MAXTL to 2 and rename MMU fields in ptl1_regs structure
6312753 workaround required for PLX erratum 34
6313403 disabling CPUs on console can hang console interface
6313788 Glvc driver need to enable interrupt on virtual channel
6313837 additional safe measure in px required to make cpr to work
6313842 incorrect checking causes non fatal imu_rbne panic
6317693 Chicago needs to use ebus RTC instead of southbridge RTC

Friday, October 07, 2005

Where we are in a file?

Let's say you are tar+gzip'ing large file and wonder what is a progres. Well with DTrace that's really simple. First check what is the file size then lets check every 10s where were are currently in a file. Additionally write to the output how many bytes we read since last time.

bash-3.00# ls -l
total 9223488
-rw-r--r-- 1 nobody other 2 Oct 4 11:30 bounds
drwxrwxrwx 2 nobody other 2560 Oct 7 08:13 core
-rw-r--r-- 1 nobody other 1206160 Oct 4 11:21 unix.0
-rw-r--r-- 1 nobody other 4718870528 Oct 4 11:30 vmcore.0

bash-3.00# dtrace -n io:::start'/args[2]->fi_pathname == "/mnt/vmcore.0"/{fs=args[2]->fi_offset;}' -n BEGIN'{last=0;}' -n tick-10s'{trace(fs);trace(fs-last);last=fs;}'
dtrace: description 'io:::start' matched 6 probes
dtrace: description 'BEGIN' matched 1 probe
dtrace: description 'tick-10s' matched 1 probe
CPU ID FUNCTION:NAME
0 47663 :tick-10s 3758587904 3758587904
0 47663 :tick-10s 3792044032 33456128
0 47663 :tick-10s 3838902272 46858240
0 47663 :tick-10s 3890118656 51216384
0 47663 :tick-10s 3936387072 46268416
0 47663 :tick-10s 3977871360 41484288
0 47663 :tick-10s 4020830208 42958848
0 47663 :tick-10s 4069294080 48463872
0 47663 :tick-10s 4120510464 51216384
^C

bash-3.00#

New laptop community

New laptop community has been created on Open Solaris. WiFi Solaris Wireless Driver for Atheros AR52xx 802.11b/g Chipset-based Cards has been posted with source. Also new wificonfig tool is available.
If you've got an laptop and have any problems with putting (Open) Solaris on it this new community is a place to go.

Wednesday, October 05, 2005

StarOffice 8 for free

"Academic and Research institutions, including Primary and Secondary (K12) Schools, 2-and 4-year Colleges, and Universities, are eligible for a no-cost license of StarOffice 8. All you have to do to obtain the StarOffice 8 Software License and Entitlement is purchase at least one StarOffice 8 Enterprise Media Kit or download the software and click thru the license agreement on the Sun Software Download Center as described below."

"If you are a student, researcher, staff, or faculty member you can download StarOffice 8 for free from Sun's Software Download Center."

I've just installed it on my workstation with Solaris 10 - works great. Some MS Office ducuments which weren't imported properly using SO7 are working now!

Official page here.

Sun and Google partnership

Sun and Google announced partnership. No much was reveled except that Sun will bundle Google Toolbar with Java. Google will buy more Sun hardware - "We're already a Sun systems customer, and we're going to extend that quite significantly," said Google CEO Eric Schmidt.

"There's a huge alignment strategy with research and development, (involving) OpenDocument format, OpenOffice and OpenSolaris," said McNealy.

You can find a lot of media coverage for this announcement - the one I get citations from is here.

Well, this is good news. If Google will really get involved seriously with Open Solaris that would be great. Then Google buying more hardware from Sun (Opteron?) is another good news.

Monday, October 03, 2005

Another Open Solaris distribution

LiveCD distribution based on Open Solaris named BeleniX - more info here.
So right now there are 3 distributions based on Open Solaris: Solaris Express, SchilliX and BeleniX. Of course SX is most mature and complete.

Friday, September 30, 2005

Sun Trunking for free

Thanks to SunHelp for pointing out that SunTrunking is free now. I guess this probably due to fact that link trunking and aggregation is already part of Solaris Express and is going to be integrated into Solaris 10.

Key features
  • Full IEEE 802.3ad compliance with support for the Link Aggregation Control Protocol (LACP).
  • Dynamically add and remove trunk members for on-demand bandwidth.
  • Bundled Gigabit Ethernet and Fast Ethernet links.
  • Load Balancing.
  • Automatic link level failover.


SunTrunking download page.
SunTrunking page.
SunTrunking documentation.

Thursday, September 29, 2005

3GB free email account

We've just updated our email system at http://poczta.wp.pl. We provide 3GB free email account and 6GB for paid customers. I believe this is largest(1) free email account you can get worldwide so far. We are also #1 in Poland with over 3mln active email accounts!

Additionally we've launched thumbnail pictures - so if you got an email with graphics attachments you can see small versions of these pictures under your mail. You can also switch given folder from normal view with list of emails to view with thumbnail pictures from all emails inside that folder.

There are also many smaller enhancements like saving selected attachments in one zip archive, showing all selected pictures on one page, etc.

More new features are coming! :))

Unfortunately we do not offer English interface (yet).

Below some screnshots.







(1) at least when you consider big email providers like Yahoo, GMAIL and other

SX b23 is out

Solaris Express - Community Edition build 23 is ready to download.

update: changelog

Wednesday, September 28, 2005

Opteron vs. Xeon

We did some tests benchmarking Opteron box and Xeon box on our MTA system.
Servers: Sun v20z with 2x 270 (dual-core 2.0GHz), IBM x345 with 2x Xeon 2.8GHz with HT.

We switched off second cores in v20z, so we were benchmarking 2x 2GHZ Opteron vs. 2x 2.8GHz Xeon. Both systems were equipped with 2GB RAM. Opteron box had 2x300GB 10K internal disks in mirror, Xeon box had 6x33GB 15K internal disk in raid 10.

System: on both servers Solaris 10 was used. Applications were compiled on Xeon and the same binaries were used on Opteron so no special Opteron optimizations were used. Application is an MTA system with antivirus software, spam filters, and many more modules. As Opteron box has slower disks we anticipated it could be slower, but on the other hand all these applications really need lot of CPU power. We used slamd for benchmarking from several clients.

Result: Opteron server is 1.6x faster then Xeon server. Turning additional cores on and Opteron server is 2.62x faster then Xeon server.

Well, we're going to test our web servers next - I wonder about results.

btw: with all cores turned on we couldn't saturate all CPUs on Opteron server - more client servers were needed and at that time I couldn't spent more (10 client servers were used). Xeon server CPU was saturated all the time. Additionally Opteron had much slower disks, so probably we could get even better results with Opteron.

Using NAT with Zones

Sometimes it would be useful to make NAT for local zones. Here is a description on hot to do it using Solaris 10 Zones and IPFilter (which is part of Solaris 10).

Wednesday, September 21, 2005

SX 9/05

Solaris Express 9/05 is available. As usual thanks to Dan for his What's New. This release is base on build 21. If you want to stay on the edge of the edge then go and download Solaris Express - Coummunity Edition based on build 22.

UltraSparc IV+

Finally, new SPARCs are available. Looks like you get can get the same server with new CPUs for the same price.

And official announcement.

Wednesday, September 14, 2005

X4100 & x4200 architecture

If you want to find out more about x4100 & x4200 server architectures - here is right document.

LSI MegaRAID

Here you can download "Driver ITU floppy image for Solaris 10GA. It contains lsimega driver which will support LSI MegaRAID 320-2x, LSI MegaRAID 320-2e, Dell PERC 4e/si(ROMB controller in Dell PE1850) and Dell PERC 4e/di(ROMB controller in Dell PE2850) RAID controllers".

Censored Sun Ads

Top business publications refused to run our bold ad concepts because the headlines were thought too controversial. At Sun, we're the radical engineers that build "ass-whoopin" technology - we're not Miss Manners and we never want to be.

Censored Ads

Flying Ads

Tuesday, September 13, 2005

ZFS is coming

Eric Schrock has posted:

We're currently in the "end game" for ZFS development - pushing full
speed towards integration (I won't make claims about an exact time
frame). We considered a more accessible beta program (as well as
opensolaris.org hosting), but it was not an efficient use of team
resources at this point in time, and would likely delay the eventual
integration into Nevada. We promise we're doing everything possible to
get ZFS putback ASAP.

So it looks like it's not that far in a future when ZFS will be publicly available. I can only add that you will really love its simplicity and features.

Monday, September 12, 2005

New x64 servers from Sun

Three new x64 servers from Sun: x2100 x4100 x4200.


There's already review about x2100 server.

My article about Open Solaris

I wrote an article about Open Solaris for a polish Linux+ magazine and it looks like it was translated to Germany and has been made online. As there wasn't enough time it got published after editorial corrections without my final approval - as a result there's at least on mistake(1). I hope that translation hasn't introduced any more mistakes...

(1) - LAE is not part of Open Solaris, at least not right now.

ps. some things have already changed as article was written in June.

Future Networking in Solaris

If you are intertested in what's going on with Networking in Solaris check this post. You can also download interesting document which explains some of the plans - IPMP Rearchitecture: High-Level Design Specification. Below is a quote from the post:

As some of you may be aware, the Solaris Approachability team has a project underway called "Clearview", whose charter is to rationalize, unify, and enhance the way network interfaces are handled in Solaris at the programmatic and administrative levels. Under the Clearview umbrella, there are currently four components under development:

IPMP Rearchitecture
IP Tunnel Device
Vanity Naming and Nemo Unification
IP-Level Observability Devices

Friday, September 09, 2005

New cluster

I've just installed and set up new nfs cluster using Sun Cluster 3.1 8/05 on Solaris 10. All with IPMP and MPxIO. The installation was easy, configuration identical like in older releases. It just works :)

3D Desktop

I've just tried Project Looking Glass on my new workstation. It's a 3D desktop running on Solaris or Linux (I did on Solaris). Well it's definitely eye-catching!
You can find video demo here.

Tuesday, September 06, 2005

SX b21 is out

Although b22 is available internally to Sun, b21 is available to download.

Disk sets in SVM

Using SVM disk sets even with single host (non-clustered) does makes sense for external connected storage. If you for some reason want to reinstall system or install new one but want to preserve all SVM made raids then disk sets make it really easy for you. Disk sets are really usefull if you want to move your external storage from one system to another. Below example what I did today - system was reinstalled but I wanted to preserve all SVM volumes on external JBOD. And if there's only one host using these disks then you probably want to set auto-take on.


bash-3.00# metaimport -r
Drives in regular diskset including disk c4t0d0:
c4t0d0
c4t1d0
c4t2d0
c4t3d0
c4t4d0
c4t5d0
c4t6d0
c4t7d0
c4t8d0
c4t9d0
c4t10d0
c2t16d0
c2t17d0
c2t18d0
c2t19d0
c2t20d0
c2t21d0
c2t22d0
c2t23d0
c2t24d0
c2t25d0
c2t26d0
More info:
metaimport -r -v c4t0d0
Import:
metaimport -s c4t0d0

bash-3.00# metaimport -s E3-0 c4t0d0
Drives in regular diskset including disk c4t0d0:
c4t0d0
c4t1d0
c4t2d0
c4t3d0
c4t4d0
c4t5d0
c4t6d0
c4t7d0
c4t8d0
c4t9d0
c4t10d0
c2t16d0
c2t17d0
c2t18d0
c2t19d0
c2t20d0
c2t21d0
c2t22d0
c2t23d0
c2t24d0
c2t25d0
c2t26d0
More info:
metaimport -r -v c4t0d0

Disk movement detected
Updating device names in Solaris Volume Manager
bash-3.00#
bash-3.00# metaset -s E3-0 -A enable
bash-3.00# metaset |head -5

Set name = E3-0, Set number = 1

Host Owner
bolek.db.srv Yes (auto)
bash-3.00#

New patches

If you want to have a clean zones installation after applaying all patches to S10 GA make sure that you apply these patches: 119255-06 (x86/x64) 119254-06 (SPARC), preferably as first patches in a system. With these patches after doing 'smpatch update' zones installation should be clear - no single warning.

Wednesday, August 31, 2005

W1100z

I've got new workstation on my office desk! Actually it arrived two weeks ago but I had not much time to completly migrate my enviroment from old workstation (Sun blade 150) to this new one - W1100z. This one has AMD Opteron 150m 2GB RAM, with two internal disks (mirrored), 19" LCD Sun's monitor, DVD dual-layer burner :)))

I've been (well, I did...) working on my old workstation using Solaris 9 with CDE - it's been really STABLE enviroment and I mean it. I've been using it every day for my job (and notebook for WWW) and it has uptime of over 660 days! I haven't even logout from X session (CDE) for the same time period. Additionally some users were working remotly on this workstation.


bash-2.05$ uptime
8:35am up 663 day(s), 22:13, 5 users, load average: 0.25, 0.25, 0.30

bash-2.05$ ps -ef -o comm,etime|grep Xsun
/usr/openwin/bin/Xsun 662-21:48:03


I know it's silly but I feel sorry I have to turn it off - such a lovely uptime for a workstation. Anyway they are people waiting for this one so I'm gonna migrate completly to new one in next few days.

On my new workstation I put Solaris 10 with NVIDIA drivers Thanks to NVIDIA HW OpenGL and I'm going to try JDS instead of CDE. Well, I'm not expecting the same stability as with CDE but maybe it's good enough. And CDE is getting obsolete anyway. It's been working quite well for last two weeks (except some FireFox crashes everything else is working properly). I tried mplayer from BlastWave - works ok, but I decided to compile it myself so I get xv and CPU extensions - works just great. Xine is working too, so is Xmms. Playing four divX movies at the same time using mplayer is cool :) is working. Burning DVDs also works.

So far so good.

btw: I really like the way disks or DVD/CD are mounted in this system - no screwdrivers - for disks you get special bracket, for DVD/CD there's quite simple and clever solution - anyway no screwdrivers.

This workstation is really fast - I love it!

Monday, August 29, 2005

SMBIOS & Solaris

Which means that prtdiag on x86 is coming :) Blog entry here.

Dynamic Reconfiguration & Solaris 10

I tried DR yestarday on Solaris 10 - and wow, I'm impressed. With older Solaris releases you couldn't unconfigure&detach CPU/MEM board on which kernel resided - usually it meant lowest CPU/MEM board in a system. I tried to unconfigure such a board on Solaris 10 - and wow! it works!
However there's an issue - system freezes for a moment (2-4 minutes) and then continues to run. It's not a case when you unconfigure a board without kernel on it. It's much better then in previous releases of Solaris.



bash-3.00# prtconf | grep Memory
Memory size: 32768 Megabytes
bash-3.00# psrinfo |wc -l
16
bash-3.00#

bash-3.00# cfgadm -alv|grep -i permane
N0.SB2::memory connected configured ok base address 0x0, 16777216 KBytes total, 1447408 KBytes permanent

bash-3.00# cfgadm -c unconfigure N0.SB0
bash-3.00# prtconf | grep Memory
Memory size: 16384 Megabytes
bash-3.00# psrinfo |wc -l
8

bash-3.00# cfgadm -c configure N0.SB0

bash-3.00# cfgadm -c unconfigure N0.SB2
System may be temporarily suspended, proceed (yes/no)? yes

bash-3.00# prtconf | grep Memory
Memory size: 16384 Megabytes
bash-3.00# psrinfo |wc -l
8
bash-3.00#
bash-3.00# cfgadm -alv|grep -i permane
N0.SB0::memory connected configured ok base address 0x0, 16777216 KBytes total, 1447408 KBytes permanent
bash-3.00#

Thursday, August 18, 2005

Monday, August 15, 2005

Sun Cluster 3.1 8/05

New release of Sun Cluster is out! - just in time, as I'm gonna setup new NFS cluster in coming days.

Some selected What's New:
  • support for NAS (NFS v3/v4)
  • support for tagged Virtual Local Area Networks (VLANs) to share an adapter between the private interconnect and the public network
  • HA for Solaris Containers (Zones)
  • support for SMF services
  • support for AMD 64-bit platform
  • support for Oracle 10g (+RAC)
Sounds good. Although there's a problem - HA-NFS doesn't support NFS v4 :(((

Sun Cluster 3.1 8/05 Documentation
Sun Cluster 3.1 8/05 BASE
Sun Cluster 3.1 8/05 Agents

SX b20 is out

Solaris Express - Community Release build 20 is out.

Friday, August 12, 2005

v20z and large internal disks

If you want to make use of all available space on some larger disks (like 300GB) in v20z you need to apply 119375-03 on Solaris 10. Now what if you want to install system over a network and you're using JumpStart? Well, this is something I had to do today.

First, you have to patch net install image on a install server. To do this just run:

# patchadd -C /install/s10-GA-x86/Solaris_10/Tools/Boot 119375-03

This allows to install system using entire disk. But then you need to apply the patch just after a system is installed and before first boot. The best place for it is finish script. All you have to do is to add something like:

inst_arch=`uname -i`
if [ $inst_arch -eq "i86pc" ]; then
echo ' - adding patch 119375-03 for big disks'
patchadd -R /a/ ${SI_CONFIG_DIR}/patches/119375-03
fi

Compiling Open Solaris for fun

I've just compiled opensolaris-src-20050720 on E2900 in less then an hour! (snv_19, SOS, DEBUG).


bash-3.00# time nightly ./opensolaris.sh

real 51m48.739s
user 199m27.220s
sys 58m57.491s


I tried again but this time put all files to compile on tmpfs filesystem and there's slight improvement.


bash-3.00# time nightly ./opensolaris.sh

real 46m9.597s
user 197m54.354s
sys 56m52.978s
bash-3.00#

Wednesday, August 10, 2005

Linux and Solaris

I've found this article from IBM which tries to compare some of Solaris 10 features to those available in Linux. Well, in my opinion the article is very biased and not fair to Solaris. Here are some examples.
Dtrace uses an in-kernel interpreter whereas SystemTap uses compiled native code. Compiled native code is faster than interpreted code. Therefore, using SystemTap will not affect the performance of the system while performing performance measurements. The in-kernel interpreter has to be completely bug free, otherwise problems in the interpreter itself can cause the system to crash.
That is interesting 'coz from SystemTap reference:
When complete, the generated C code is compiled, and linked with the runtime, into a stand-alone kernel module. [...] To run the probes, the systemtap driver program simply loads the kernel module using insmod. The module will initialize itself, insert the probes, then sit back and let the probe handlers be triggered by the system to collect and pass data. It will eventually remove the probes at unload time.
Well, so probe points are executed in-kernel too. I wonder how would SystemTap protect from null pointer dereferences and so on. DTrace uses interpreter so it does check for security and safety execution on the fly and catches possible problems when they occures.

When it comes to security SystemTap reference says:

DProbes exposes the KProbes layer in such a way that it is not crashproof, as it does allow invalid instrumentation requests.

Well, that doesn't sound safe. And then DTrace makes use of Priveleges so you can give DTrace to common (non-root) users without a security risk.

SystemTap doesn't aggregate data at a source so it has to transfer all data from kernel to user space and then filter only what you need. This could be performnace problem. Dtrace on the other hand aggregates data at a source so only revelant data are copied - this is clear performance win.

DTrace has already a lot of useful providers which let you know what's going on system or application in a fast and secure manner. I mean IO provider which lets you measure IO characteristics of your system and applications without knowig the details of how these IOs are generated. Then there're PID, SCHED, PLOCKSTAT, ... The main advantage of using these providers is that you can measure sometinh really useful and solve a problem without knowing exact implmentation in kernel. Hey, we're sys admins and application developers and not kernel programers.

And one more thing

Moreover, the interpreter is newly developed and not as mature as the compiler, hence there is a higher possibility of encountering bugs.
Well, DTrace has been tested in many PRODUCTION enviroments for quite some time (well over a year) and it proved itself excellent. I would say that it's SystemTap that has to yet prove itself. Not to mention that it looks like SystemTap is available for PPC64 starting with lates Linux kernels like 2.6.13-rc1 - which means that it's almost not tested at all and it probably isn't even included in RHEL for PPC while DTrace is already in a stable and production used Solaris 10 not to mention DTrace was available long before and used by many.

Then it looks like SystemTap can trace ONLY kernel functions. Well DTrace can do both: kernel and user space. And this is one of the big advantages of DTrace - you can easly correlate and follow different events from both: kernel and your applications. This is far more complete approach and quite useful in practice. Looking at porting status matrix of SystemTraps it looks like there's no active development of user space tracing.

Then in Table 1 there's 'Ability to leverage the hardware performance counters of the CPU' and it says 'No' To DTrace and Yes to Linux - however there's footnote (4) that this feature is not currently available on Linux. I can't find footnote for (*) but there should be Yes in DTrace column as well - 'coz it's a planned feature (look at USENIX'04 paper - Future Work section).

There are some entries in the table about optimizing programs for a particular workload and about generating a flos of instructions. Other tools than SystemTap are specified so to be fair there should be Yes in Solaris column too (for example: Improving Code Layout Can Improve Application Performance and SOS10).

Then last two entries in the table are: "Ability to write arbitrary locations in kernel memory" and "Ability to invoke arbitrary kernel subroutines" - of course again 'No' to DTrace and 'Yes' to Linux. Well, it's not perfect but actually you can use system procedure in DTrace to call mdb and then you can read/modify kernel memory and/or call kernel subroutines.

Well, SystemTap is lacking many of the features, is not well tested, is lacking good documentation - it looks like SystemTap is currently more a prototype then a working solution, especially on PPC64 platform.

Then in Multithreading Enhancements section they forget to add that there's also TLS support for Solaris x86 in GCC and not only in Sun's CC. Then there's table comparing SPECjbb2000 on v40z and some IBM's server. I don't know what point there's in that comparison as they compare: 64 bit JVM on Solaris with 32bit JVM on Linux, and on SOlaris it's 1.5.0 while on Linux it's 1.4.2. Not to mention that this is on different CPU architectures. This comparison is pointless regarding to this article (I quickly look at some prices of these servers and looks like IBM solution costs much more).

And at the end of the article there's:
The conclusion is clear; Linux provides a wide range of tools and technologies that are technically comparable, or better, than those offered in Solaris 10.
Well, this is really interesting. Let's see it again - article discusses thre features: Dynamic Tracing (DTrace and SystemTap), MPO and threads. Well, SystemTap is still much a work in progress and lacking many features of DTrace and it still has to prove itself in a production. While DTrace has a rich set of additional providers, proved itself in a many production enviroments to be not only helpful but to be secure and scalable. Then I can't find anything in the article which states any possible advantade using MPO or threads on Linux. Rather I find Solaris threads more mature as they've been used for many commercial and in-house applications for years and Solaris is well known for it's threads scalability. Not to mention entire ecosystem of additional applications both for sys admins and developers which makes Solaris threads more mature.

Don't get me wrong - Linux is great. SystemTap is interesting but looking at documentation SystemTap still is not even close to DTrace. It's just that I find this article being really biased toward Linux - but probably I shouldn't be that much suprised - after all it's IBM who wrote it.

And IMHO it's good that Linux develops it's own tools similar to DTrace - after all healthy competition is a good thing.

update: Casper Dik has posted some more comments on the same article. All I cansay is that I completly agree with Casper.

update: James Dickens posted his own comments

And one more thing as Casper noticed - SystemTap compiles and loads C code in a kernel as a module! While DTrace runs in kernel only its own code. Now what is more secure? Putting arbitrary code into a kernel (SystemTap) or run only well tested code in a kernel? And it seems that I was right - just by dereferencing null pointer you can still panic whole Linux using KProbes which is not a case with DTrace.


Tuesday, August 09, 2005

PHP with DTrace

Bryan Cantrill and Wez Furlong have added support for DTrace in PHP. Check Bryan's post about it.

update: some more information and examples from Bryan.

Thursday, August 04, 2005

I/O frameworks

I was looking for something completly different and find an interesting article about different I/O architectures. It's a little bit outdated but still it's worth reading.

Top 10 DTrace scripts

Matty has posted his own Top 10 DTrace scripts. These are really useful and easy to use tools based on DTrace - if you don't know DTrace and want to know more about what's going on in your system you should check them. Even if you know DTrace you should probably look at them - there's always something to learn about.

Another collection of useful tools based on DTrace can be found here. And againa - these are ready to use tools for every sys admin even without understanding DTrace.

Tuesday, August 02, 2005

Friday, July 29, 2005

Installing SX over a net with customized miniroot

I do use sfe driver from Masayuki Murayama to get networking and I use patched version of ata driver to get DMA working with internal disk on my laptop. Now I do install new SX versions on my laptop over the network as it's much faster and I do not have to burn new set of CDs every time new SX is out. With New Boot Architecture all I do is:

1. copy x86.miniroot file from /tftpboot
2. un-gzip file
3. using lofiadm mount x86.miniroot as UFS
4. install sfe driver with base_root changed to mounted miniroot
5. copy patched version of ata
6. umount file and gzip it again
7. put the gzipped file on /tftpboot with different name (x86.miniroot_laptop)
8. change menu.lst in /tftpboot/boot/grub to use new file

That way you can customize x86.miniroot in any way you want. As with New Boot Architecture no more Real Mode drivers are needed you can put there standard drivers for RAID, SCSI, etc. adapters and it should work.

btw: although it's good to copy driver packages to miniroot or install server too, so after the installation, before system reboots (I choose to manually reboot after install) you can install this drivers to newly installed system. In my case just before a reboot I copy ata driver and install sfe driver. You will find newly installed system under /a mountpoint.

SX b19 available

Solaris Express: Community Release build 19 is available. I've already installed it on my laptop :)

Open Solaris and Xen - alive!

That's really exciting!

"Hello World" from Solaris on Xen Last Friday we went multiuser for the first time on our Solaris-on-Xen port.

And we're happy to have other people join this project at this early stage to help us do that, or even just to experiment with the code in whatever other way they want to. To enable that, we're launching an OpenSolaris community discussion group about OpenSolaris on Xen where future postings like this will end up.

Full post

Thursday, July 28, 2005

Solaris 10 - 2 million licenses

OSNews wrote:

"Sun Microsystems has distributed more than two million registered licenses for the Solaris 10 Operating System since the software became available on January 31st."

And official announcement here.

Monday, July 25, 2005

File Descriptors in DTrace

Starting with b16 of SX new useful feature has been added to DTrace - fds[] array.

"[...] returns information about the file descriptors associated with the process corresponding to the current thread. The array's base type is the fileinfo_t structure already used by DTrace's I/O provider, with a new member for the open(2) flags. Here's an example of fds[] in action:

$ dtrace -q -s /dev/stdin
syscall::write:entry
/ execname == "ksh" && fds[arg0].fi_oflags & O_APPEND /
{
printf("ksh %d appending to %s\n", pid, fds[arg0].fi_pathname);
}
^D

If I run this command on my desktop and start typing commands in another shell, I see output like this:

ksh 127453 appending to /home/mws/.sh_history
ksh 127453 appending to /home/mws/.sh_history
...

"


Well, this is going to be really useful!

btw: if you have a nfs server and want to use IO provider then I'm sure you will welcome bug id 6175304 which is RFE I submitted some time ago and is integrated in b17 - in practice it allows you to use IO provider on nfs server so observing nfsd will be much more friendly.

Solaris Express 17 is out

Solaris Express build 17 is out. Check out official What's New. You can find unofficial What's New here.
There's Community Release of SX based on build 18 - this is latest build but less tested - works on my laptop very well. You can get b18 here.

Wednesday, July 20, 2005

Safety in DTrace

What is one of the best features of DTrace? - that it's production ready. What it means? Well, Bryan Cantrill, one of the DTrace creators, has posted some background information about built-in (or should I write - architectured-in) safety in Dtrace. From my own user perspective, I must admit that this is one of the most important features of DTrace - I can safely use it in a production, and I have actually been doing it for almost 2 years. In practice using DTrace in a production proves to be so safe that I let our developers to use it on a production servers. And why is it so important to use tools like DTrace on a production systems? Well, most pressing problems occur while in a production and during peak hours. If you really want to find your bug quickly, find a bottleneck or some data flow characteristic, etc. you should be doing it at the place were a problem is - production. But you don't want to risk accidentally shutting down your application or entire system during peak hours, don't you? Without all these safety bulit-in DTrace still would be valuable software, but I wouldn't use it in a production so much if at all. When I'm talking to people about DTrace I always start talking about its safety in a first place.

Tuesday, July 19, 2005

Mathematica

Well known Mathematica is available on Solaris 10 x86 platform with 64bit Opteron support. This is great news for all Mathematica and Solaris x86/x64 users. All supported platforms are listed here.

Oracle and multi-core chips

"Oracle will continue to recognize each core as a separate processor; however, the processor definition has been amended as it relates to counting multi-core chips to determine the total number of processor licenses required. For the purposes of counting the number of processors that require licensing, the number of cores in a multi-core chip now shall be multiplied by a factor of .75. Previously, each core was counted as a full processor."

"Oracle Standard Edition One or Standard Edition programs for use on a single processor server containing a maximum of 2 cores shall be priced as a single processor."


Oracle's News
eWeek

Wednesday, July 13, 2005

Solaris on x86 notebooks

I've just noticed new article on Open Solaris site about Solaris for x86 notebooks. It's a nice overview of current status.

I have Solaris dual-booted with Windows on my current (and previous) laptop for quite some time - over 3 years now. I know some other people who do have Solaris on their laptops too. I agree with the article that installation is quite easy, there're some problems with drivers - although it's getting better every month. Few months ago I had to manually tweak Xorg configuration to get 1400x1050 - with latest Solaris Express it works out of the box. There's 1400x1050 background for login screen (such a small thing but makes feel it better). Then there's GRUB for some time (starting with b14) - so booting Solaris/Windows is more user friendly and is much faster now (due to new boot architecture that GRUB is part of). Then there're being added such small features like virtual mouse (works great!) which let's you to use touchpad and usb/ps2 mouse simultaneously, nforce chipset support, nvidia divers, battery support, and so on. One of a common problems is NIC support - I use sfe driver thanks to Masayuki Murayama - you can find more drivers there for common ethernet cards. Some manufactures like Broadcom provide their own drivers (although with Solaris comes bge driver which covers quite a lot of Broadcom NICs - sometimes it's just a metter of addind new id). USB pendrives and disk drivers - work too. WiFi from Sun is coming, and so on.

While Solaris on x86 notebooks is not perfect, it's gettint better literally month by month (thanks to Open Solaris and SX) and it works really good for a lot of people.

Tuesday, July 12, 2005

libumem non-documented features

If you are developing an application in C on Solaris you should familarize yourself with libumem and it's debug abilities. I used it many times with good results. Here are some undocumented features of libumem.

DTracing GNOME

Bryan Cantrill has posted nice blog entry with quick looking in what's going on during logging to GNOME session. It's a nice intro to debugging other applications.

Friday, July 08, 2005

Building community around Open Solaris

Now, that's a good news! Another example that Sun is really serious in building real community with external folks around Open Solaris. This is part of a post by Bryan Cantrill on Open Solaris DTrace mailing list:


"We are pleased to announce that we have decommissioned Sun's internal
DTrace interest list, having added the members of that list to this one on
opensolaris.org. This adds over 400 people (!) to this list, and fulfills
a goal that we have long had -- to unify our internal and external DTrace
communities into one larger community. This should result in quite a bit
more traffic to this list (the internal DTrace list sees on the order of
five to ten posts per day), and it will certainly result in much more
DTrace expertise and interest looking at your question, comment or
contribution."

Wednesday, July 06, 2005

Getting LUNs for MPxIO devices

I wrote simple C program which displays basic info about FC HBAs in a system and lets you see available targets on each HBA with information like LUN number, target ID, WWPN, etc.
Why did I write it? Well, when you have a lot of devices under MPxIO and you often add new LUNs, then there's no simple (I mean quick) way to figure out which MPxIO devices are which. Then if there're several arrays attached it's getting even worse. Normally you can use luxadm but with a lot of LUNs it actually takes some time to find given LUN.
Program compiles and works on Solaris 10, makes use of libHBAAPI so not all HBAs will work (depends on driver). Qlogic HBAs with standard Solaris drivers works. I don't know about the others. Although this program works (for me at least) - I must admit that it's not a clean code and I'm not a programmer, some things are done ugly (static arrays, missing safe checks, etc.) - I have no time to correct this ('coz it works).

btw: looks like there's a tool named fcinfo in SX which can give the same (and more) info (but there's none in S10).

Ok, some example output.


bash-3.00# ./hba_inq
Number of HBAs: 2


HBA(0) name: QLogic Corp.-2200-0
Manufacturer : QLogic Corp.
SerialNumber :
Model : 2200
Description : 2200
SymbolicName :
HardwareVersion :
DriverVersion : 20050104-1.58
OptionROMVersion : 1
FirmwareVersion : 2.1.142
DriverName : SunFC Qlogic FCA v20050104-1.58
NumberOfPorts : 1
NodeWWN : 20:00:00:e0:8b:05:3e:7c


HBA(1) name: QLogic Corp.-2200-1
Manufacturer : QLogic Corp.
SerialNumber :
Model : 2200
Description : 2200
SymbolicName :
HardwareVersion :
DriverVersion : 20050104-1.58
OptionROMVersion : 1
FirmwareVersion : 2.1.142
DriverName : SunFC Qlogic FCA v20050104-1.58
NumberOfPorts : 1
NodeWWN : 20:00:00:e0:8b:01:94:01
bash-3.00#

// another server with less LUNs

bash-3.00# ./hba_inq -l
Number of HBAs: 2


HBA(0) name: QLogic Corp.-2200-0
Manufacturer : QLogic Corp.
SerialNumber :
Model : 2200
Description : 2200
SymbolicName :
HardwareVersion :
DriverVersion : 20050104-1.58
OptionROMVersion : 1
FirmwareVersion : 2.1.142
DriverName : SunFC Qlogic FCA v20050104-1.58
NumberOfPorts : 1
NodeWWN : 20:00:00:e0:8b:01:95:01
NumberOfLUNs : 6
bus 5 target 71424 lun 56 "/dev/rdsk/c4t600A0B80001652440000003C428E0AEBd0s2"
bus 5 target 71424 lun 55 "/dev/rdsk/c4t600A0B80001652440000003B428E0ACBd0s2"
bus 5 target 71424 lun 54 "/dev/rdsk/c4t600A0B80001652440000003A428E0AA7d0s2"
bus 5 target 71424 lun 53 "/dev/rdsk/c4t600A0B800016524400000039428E0A79d0s2"
bus 5 target 71424 lun 52 "/dev/rdsk/c4t600A0B800016524400000037428E0A49d0s2"
bus 5 target 71424 lun 31 "/dev/rdsk/c5t200400A0B8169D18d31s2"


HBA(1) name: QLogic Corp.-2200-1
Manufacturer : QLogic Corp.
SerialNumber :
Model : 2200
Description : 2200
SymbolicName :
HardwareVersion :
DriverVersion : 20050104-1.58
OptionROMVersion : 1
FirmwareVersion : 2.1.142
DriverName : SunFC Qlogic FCA v20050104-1.58
NumberOfPorts : 1
NodeWWN : 20:00:00:e0:8b:05:34:7c
NumberOfLUNs : 6
bus 1 target 135168 lun 56 "/dev/rdsk/c4t600A0B80001652440000003C428E0AEBd0s2"
bus 1 target 135168 lun 55 "/dev/rdsk/c4t600A0B80001652440000003B428E0ACBd0s2"
bus 1 target 135168 lun 54 "/dev/rdsk/c4t600A0B80001652440000003A428E0AA7d0s2"
bus 1 target 135168 lun 53 "/dev/rdsk/c4t600A0B800016524400000039428E0A79d0s2"
bus 1 target 135168 lun 52 "/dev/rdsk/c4t600A0B800016524400000037428E0A49d0s2"
bus 1 target 135168 lun 31 "/dev/rdsk/c1t200500A0B8169D18d31s2"
bash-3.00#

// now another server with a lot of luns
// it's easy to filter by given WWPN
// first see what we have

bash-3.00# cfgadm -al
Ap_Id Type Receptacle Occupant Condition
c0 scsi-bus connected configured unknown
c0::dsk/c0t0d0 disk connected configured unknown
c0::dsk/c0t1d0 disk connected configured unknown
c0::dsk/c0t6d0 CD-ROM connected configured unknown
c1 fc-fabric connected configured unknown
c1::200500a0b8169d18 disk connected configured unknown
c1::5006016008065109 disk connected configured unknown
c2 fc-fabric connected configured unknown
c2::200400a0b8169d18 disk connected configured unknown
c2::5006016108065109 disk connected unconfigured unknown
c2::5006016908065109 disk connected configured unknown
c2::500604843d489c84 disk connected configured unknown
c3 scsi-bus connected configured unknown
c3::dsk/c3t8d0 disk connected configured unknown
c3::dsk/c3t9d0 disk connected configured unknown
c3::dsk/c3t10d0 disk connected configured unknown
c3::dsk/c3t11d0 disk connected configured unknown
c3::dsk/c3t12d0 disk connected configured unknown
c3::dsk/c3t13d0 disk connected configured unknown
c3::dsk/c3t14d0 disk connected configured unknown
bash-3.00#

// lets see which devices are seen on c2::200400a0b8169d18
// only some last entries

bash-3.00# ./hba_inq -lw | grep "200400a0b8169d18" | tail
NodeWWN 200400a0b8169d17 PortWWN 200400a0b8169d18 bus 2 target 71424 lun 10 "/dev/rdsk/c4t600A0B8000169D170000001541C9C2A6d0s2"
NodeWWN 200400a0b8169d17 PortWWN 200400a0b8169d18 bus 2 target 71424 lun 9 "/dev/rdsk/c4t600A0B8000169D170000001441C9C27Ed0s2"
NodeWWN 200400a0b8169d17 PortWWN 200400a0b8169d18 bus 2 target 71424 lun 8 "/dev/rdsk/c4t600A0B8000169D170000001341C9C256d0s2"
NodeWWN 200400a0b8169d17 PortWWN 200400a0b8169d18 bus 2 target 71424 lun 7 "/dev/rdsk/c4t600A0B8000169D170000001241C9C230d0s2"
NodeWWN 200400a0b8169d17 PortWWN 200400a0b8169d18 bus 2 target 71424 lun 6 "/dev/rdsk/c4t600A0B8000169D170000000A41C9C0B8d0s2"
NodeWWN 200400a0b8169d17 PortWWN 200400a0b8169d18 bus 2 target 71424 lun 5 "/dev/rdsk/c4t600A0B8000169D170000001141C9C1FEd0s2"
NodeWWN 200400a0b8169d17 PortWWN 200400a0b8169d18 bus 2 target 71424 lun 4 "/dev/rdsk/c4t600A0B8000169D170000001041C9C1D8d0s2"
NodeWWN 200400a0b8169d17 PortWWN 200400a0b8169d18 bus 2 target 71424 lun 3 "/dev/rdsk/c4t600A0B8000169D170000000F41C9C1A2d0s2"
NodeWWN 200400a0b8169d17 PortWWN 200400a0b8169d18 bus 2 target 71424 lun 2 "/dev/rdsk/c4t600A0B8000169D170000000E41C9C17Cd0s2"
NodeWWN 200400a0b8169d17 PortWWN 200400a0b8169d18 bus 2 target 71424 lun 1 "/dev/rdsk/c4t600A0B8000169D170000000C41C9C148d0s2"
bash-3.00#


// so lets assume we are looking for LUN 55 which was just presented to the
// server and we don't know what MPxIO device it's

bash-3.00# ./hba_inq -lw|grep "200400a0b8169d18"|grep "lun 55"
NodeWWN 200400a0b8169d17 PortWWN 200400a0b8169d18 bus 2 target 71424 lun 55 "/dev/rdsk/c4t600A0B80001652440000003B428E0ACBd0s2"
bash-3.00#



hba_inq is covered by CDDL.