Wednesday, August 10, 2005

Linux and Solaris

I've found this article from IBM which tries to compare some of Solaris 10 features to those available in Linux. Well, in my opinion the article is very biased and not fair to Solaris. Here are some examples.
Dtrace uses an in-kernel interpreter whereas SystemTap uses compiled native code. Compiled native code is faster than interpreted code. Therefore, using SystemTap will not affect the performance of the system while performing performance measurements. The in-kernel interpreter has to be completely bug free, otherwise problems in the interpreter itself can cause the system to crash.
That is interesting 'coz from SystemTap reference:
When complete, the generated C code is compiled, and linked with the runtime, into a stand-alone kernel module. [...] To run the probes, the systemtap driver program simply loads the kernel module using insmod. The module will initialize itself, insert the probes, then sit back and let the probe handlers be triggered by the system to collect and pass data. It will eventually remove the probes at unload time.
Well, so probe points are executed in-kernel too. I wonder how would SystemTap protect from null pointer dereferences and so on. DTrace uses interpreter so it does check for security and safety execution on the fly and catches possible problems when they occures.

When it comes to security SystemTap reference says:

DProbes exposes the KProbes layer in such a way that it is not crashproof, as it does allow invalid instrumentation requests.

Well, that doesn't sound safe. And then DTrace makes use of Priveleges so you can give DTrace to common (non-root) users without a security risk.

SystemTap doesn't aggregate data at a source so it has to transfer all data from kernel to user space and then filter only what you need. This could be performnace problem. Dtrace on the other hand aggregates data at a source so only revelant data are copied - this is clear performance win.

DTrace has already a lot of useful providers which let you know what's going on system or application in a fast and secure manner. I mean IO provider which lets you measure IO characteristics of your system and applications without knowig the details of how these IOs are generated. Then there're PID, SCHED, PLOCKSTAT, ... The main advantage of using these providers is that you can measure sometinh really useful and solve a problem without knowing exact implmentation in kernel. Hey, we're sys admins and application developers and not kernel programers.

And one more thing

Moreover, the interpreter is newly developed and not as mature as the compiler, hence there is a higher possibility of encountering bugs.
Well, DTrace has been tested in many PRODUCTION enviroments for quite some time (well over a year) and it proved itself excellent. I would say that it's SystemTap that has to yet prove itself. Not to mention that it looks like SystemTap is available for PPC64 starting with lates Linux kernels like 2.6.13-rc1 - which means that it's almost not tested at all and it probably isn't even included in RHEL for PPC while DTrace is already in a stable and production used Solaris 10 not to mention DTrace was available long before and used by many.

Then it looks like SystemTap can trace ONLY kernel functions. Well DTrace can do both: kernel and user space. And this is one of the big advantages of DTrace - you can easly correlate and follow different events from both: kernel and your applications. This is far more complete approach and quite useful in practice. Looking at porting status matrix of SystemTraps it looks like there's no active development of user space tracing.

Then in Table 1 there's 'Ability to leverage the hardware performance counters of the CPU' and it says 'No' To DTrace and Yes to Linux - however there's footnote (4) that this feature is not currently available on Linux. I can't find footnote for (*) but there should be Yes in DTrace column as well - 'coz it's a planned feature (look at USENIX'04 paper - Future Work section).

There are some entries in the table about optimizing programs for a particular workload and about generating a flos of instructions. Other tools than SystemTap are specified so to be fair there should be Yes in Solaris column too (for example: Improving Code Layout Can Improve Application Performance and SOS10).

Then last two entries in the table are: "Ability to write arbitrary locations in kernel memory" and "Ability to invoke arbitrary kernel subroutines" - of course again 'No' to DTrace and 'Yes' to Linux. Well, it's not perfect but actually you can use system procedure in DTrace to call mdb and then you can read/modify kernel memory and/or call kernel subroutines.

Well, SystemTap is lacking many of the features, is not well tested, is lacking good documentation - it looks like SystemTap is currently more a prototype then a working solution, especially on PPC64 platform.

Then in Multithreading Enhancements section they forget to add that there's also TLS support for Solaris x86 in GCC and not only in Sun's CC. Then there's table comparing SPECjbb2000 on v40z and some IBM's server. I don't know what point there's in that comparison as they compare: 64 bit JVM on Solaris with 32bit JVM on Linux, and on SOlaris it's 1.5.0 while on Linux it's 1.4.2. Not to mention that this is on different CPU architectures. This comparison is pointless regarding to this article (I quickly look at some prices of these servers and looks like IBM solution costs much more).

And at the end of the article there's:
The conclusion is clear; Linux provides a wide range of tools and technologies that are technically comparable, or better, than those offered in Solaris 10.
Well, this is really interesting. Let's see it again - article discusses thre features: Dynamic Tracing (DTrace and SystemTap), MPO and threads. Well, SystemTap is still much a work in progress and lacking many features of DTrace and it still has to prove itself in a production. While DTrace has a rich set of additional providers, proved itself in a many production enviroments to be not only helpful but to be secure and scalable. Then I can't find anything in the article which states any possible advantade using MPO or threads on Linux. Rather I find Solaris threads more mature as they've been used for many commercial and in-house applications for years and Solaris is well known for it's threads scalability. Not to mention entire ecosystem of additional applications both for sys admins and developers which makes Solaris threads more mature.

Don't get me wrong - Linux is great. SystemTap is interesting but looking at documentation SystemTap still is not even close to DTrace. It's just that I find this article being really biased toward Linux - but probably I shouldn't be that much suprised - after all it's IBM who wrote it.

And IMHO it's good that Linux develops it's own tools similar to DTrace - after all healthy competition is a good thing.

update: Casper Dik has posted some more comments on the same article. All I cansay is that I completly agree with Casper.

update: James Dickens posted his own comments

And one more thing as Casper noticed - SystemTap compiles and loads C code in a kernel as a module! While DTrace runs in kernel only its own code. Now what is more secure? Putting arbitrary code into a kernel (SystemTap) or run only well tested code in a kernel? And it seems that I was right - just by dereferencing null pointer you can still panic whole Linux using KProbes which is not a case with DTrace.


10 comments:

PerformanceGuru said...

I think that you and Casper should read the code, participate in the project before jumping to such conclusions.
Kprobes/Jprobes/Djprobes is the feature that allows the arbitary insertion of code into the kernel and systemtap is built upon this. There is a huge amount of checking and safety done in the translator (systemtap takes the approach that you use a restricted awk like language to express yourself but rather than interpreting it in the kernel it translates it to C). Systemtap also allows you to take of the gloves and put C directly into your code (this is in guru mode which disables the safety checks and trusts you to understand what you do). Then rather than having probe providers in the kernel you write raw tapsets in C (or people who are expert in certain kernel sections can) and these can then be referenced (much the same as the dtrace providers) in user scripts. Don't get me wrong, its in its infancy but just becuase you don't understand it and prefer Sun to IBM does not make it inferior in anyway.

There are a number of ideas (djprobes) that far outstrip those in dtrace - dtrace uses int3 instructions to drop to the IDT and execute your probe - systemtap is looking to use jmp instructions on as many instructions that are safe (certain atomicity concerns arise during the instruction replacements), this technology comes from Hitachi in Japan and is known as djprobes.

Now Sun and their 'markitecture' - a little history of dtrace:

1 - During the 90's IBM introduce a command called dtrace into OS/2 - it is the first command that tries to do userspace to kernel probing. (http://hobbes.nmsu.edu/pub/os2/system/patches/fixpack/warp_4/xr_m004/readme.dbg)
2 - 2000 IBM take the original dtrace guys and they start work on dprobes/kprobes for Linux (http://www.linuxshowcase.org/2000/2000papers/papers/moore/moore_html/)
3 - 2001 Bryan Cantryl, et al at Sun are consulted by the author of perfinst as Solaris kernel experts which was another dynamic probe implementation - please see the acknowledgements in the following - recognise anyone? (http://www.cs.wisc.edu/~tamches/mydissertation.pdf)
4 - 2004 Sun appears with this fantastic tool called dtrace and claims complete orginality over it where really it was just an amalgam of a number of ideas from the Open Source world and IBM that had gone before

I would be much happier to see the employees of Sun giving credit where is it due and particiapating in the 'Creative Commans' that they now claim to buy into, rather than the endless bitchfests about why they are superior to everyone else.

All I seem to get in my RSS feed folders are endless compalints from Sun Employees - yesterday's was a complaint about strace being auto-suggested by google when searching for dtrace - get over it - its an extremely useful tool that is an amalgam of what has gone before!

John Levon said...

Might I suggest that it's you,
performanceguru, who needs to do some reading. Namely the dtrace USENIX paper,
where you will find (contrary to your
claims) a comparison of dtrace to these
other tools, notably dprobes and kerninst; including a clear summary
of the original work present in dtrace over and above what already existed. Nobody at Sun has ever claimed dtrace invented dynamic instrumentation; just a little research would have clarified for you what is novel about dtrace.

I think you should also be careful talking
about what's actually present in Systemtap currently and what isn't. Most of Systemtap's putative features don't even exist yet. Last I saw in the djprobes discussion, they were talking about basic block analysis to use djprobes "safely". No comment...

And Adam's strace post- it's a joke. You're meant to smirk, or laugh, or something. Surprised I have to educate a fellow Brit in dry wit.

Anonymous said...

Solaris has always adopted the slower path over the fastest path which Linux folks adopt. Two different philosophies I guess. Linux has benefited from it in that Sun hasn't been able to fix the speed differences between Linux and Solaris since 1999. I would argue that if you want to instrument a production system you should first test the code in your preproduction system thoroughly and only then start instrumenting on the production box - in the interest of complete safety. In this case the system tap being a kernel module executed natively is a advantage over the slower dtrace path.

Not to say dtrace isnt cool - it is definitely good but I wouldnt mind lesser capabilities but executing at good speed. I am not the kind doing all sorts of instrumentation on a production system - there are test systems for that.

John Levon said...

What happens when you can't reproduce the problem on your development system? What if it's an intermittent problem, and taking the time to build a replica isn't an option?

Anonymous, you didn't specify what performance issues you've had with dtrace. It's more than fine almost all of the time. I must wonder if you've actually used it much. How exactly do you think dropping features will improve performance? I'd like to hear more.

And please remember: there is no actual evidence that Systemtap is faster, because it doesn't really exist.

PerformanceGuru said...

Hi John,

I have read the paper some time ago and I do appreciate the unique features of dtrace. My post was in defense of Systemtap, I don't think its quite what IBM were making out but it also has little to do with the comments that were in the orignal post that I responded to.

As I stated, it is a project in its infancy but it has promise and its not architecturally flawed because it verifies its code in userspace rather than in the kernel. This design decision, as you know from your own contibutions, is probably more to do with the strictness around code entering the kernel than anything else.

Basic probes are in a place, the djprobes stuff seems to be settling down/the limitations are understood. It has all the hallmarks of an interesting piece of work. There are huge chunks missing but I think they will be delivered as the project expands and gains traction.

Yes - I was a too damning of Sun (and of Adam in particular, he was trying to be funny but he makes a better Engineer than comedian 8^), it was just transference from my (immature?) annoyance at the slant of the original post, I suspect.

I apologise to anyone I maligned in the heat of my post.

Kind regards,

Ken.

Anonymous said...

You are right - Forget using DTrace, I am yet to succeed installing Solaris on a. my laptop b. my dev workstation c. my work machine d. Qemu, Virtual PC all of which are x86 based. Solaris simply doesn't cut it when it comes to hardware support and speed. I have grown to hate the slowness of Solaris on SPARC boxes dual, quad, 8way - Solaris feels equally slow on each one of them. Being a big fan of performance and speed and the fact that Linux actually works on all of my machines - I always have preferred it.

Now when it comes to your point of being able to debug / instrument on production machine being a big 'feature' of dtrace - I will again disagree. That's simply too broken for words. That means you install crap in production without testing all scenarios and without anticipating user actions. You wouldn't get performance problems that late in the game if you banged your application with say LoadRunner and executed all the scenarios and fixed them right there in pre-production. If you are talking about bugs so subtle, then the dtracing can even hide them in production (changed timing, execution paths etc.) - what will you do then? And no thanks, we don't need an interpreter in kernel to debug/instrument User space apps. There are other better, proven ways to do it.

PerformanceGuru said...

I have to disagree with the Anonymous viewpoint. I work in Production environments and dtrace (and hopefully Systemtap when it matures) are a godsend. Finally you can look at threads of execution, with no regard to user or kernel space, rather than trying to extrapolate from performance killing tools such as truss/strace and a bunch of aggregated stats about the O/S.

The speed at which modern businesses do businees and the shear speed at which apps come and go precludes exhaustive testing cycles, particulary ones that involves gobs of performance testing.

I wish it were different but the first real performace test most applications get where I work coincides with the GA release and the ability to put lightweight probes in the kernel and user space execution paths to pick up performance issues, or even worse subtle bugs, is the most exciting thing to happen in a while (maybe I should get out more?).

If you have a look through the Sun folks blogs and see what these guys have actually used the tool for, you will instantly see the value of tools that can be run in Production, why should good debug and performance monitoring tools be exclusive to testing and development environments (the ability to capture wothwhile debug is invaluable to Developers, particularly if a bug snuck into Production)? More and more Developers develop and test on VMWare/Xen/etc for utilisation reasons, gone are the 2N days of Dev/Prod, or 3N days of Dev/Prod/Staging like-for-like hardware purchases where I work - this makes full app testing prior to release even more of a challenge as the platforms are just too dissimilar these days.

Solaris is not as hardware pervasive in the x86 world as Linux but Solaris 10 on x86 it pretty quick of the mark and is definitely comparable to Linux since it dtrace tune-up, agree with the SPARC comments but I think is much more for large n-way boxes where overall throughput rules (think in terms of buses and memory, not just CPU's) but even here to be honest I have not tried Solaris 10 on SPARC hardware, maybe its as snappy as X86 (definitely not a cheap though 8^)...

Anonymous said...

Solaris, snappy? I respect your opinion but facts are something different.

While I agree that there are good uses of tracing I don't agree you have to do it in production. No. I haven't yet in my career heard about any business dropping code directly from dev to prod or developing in production. Sane people will never do it.

You are selling comb to bald guys (==guys who ensure production has no issues well before the code hits production). Won't work!! :)

milek said...

performanceguru wrote:
"Kprobes/Jprobes/Djprobes is the feature that allows the arbitary insertion of code into the kernel and systemtap is built upon this. There is a huge amount of checking and safety done in the translator [...]"


I know that. But still while kernel module is loaded it's almost on its own. And what if new compiler version screw up something and you load such a module? What about NULL pointer dereferences? I think DTrace approach is just safer and is immune to such issues.


performanceguru wrote:
"I would be much happier to see the employees of Sun giving credit where is it due and particiapating in the 'Creative Commans' that they now claim to buy into, rather than the endless bitchfests about why they are superior to everyone else."


You claim that you have read USENIX paper - really? Actually they give credit when it's right. And they have openly acknowledged what's new and what's "borrowed".


performanceguru wrote:
"I have read the paper some time ago and I do appreciate the unique features of dtrace. My post was in defense of Systemtap, I don't think its quite what IBM were making out but it also has little to do with the comments that were in the orignal post that I responded to."


Of course SystemTap is not (yet) what IBM claims it to be.


performanceguru wrote:
"As I stated, it is a project in its infancy but it has promise and its not architecturally flawed"


I've never said SystemTap is architecturally flawed. It's just that I do not agree with IBM on their ridiculous conslusions. I think that SystemTap is going to be a good tool, it just needs more time to get the rest of the features and to prove itself on a market.


"The conclusion is clear; Linux provides a wide range of tools and technologies that are technically comparable, or better, than those offered in Solaris 10."


They showed nothing in the article which would prove their claims. Not to mention that the author of the article didn't try to be fair, even just a little bit. The problem with the article is that it's written so it looks like it's technical article while in reality this is just marketing crap.

And to anonymous - well many performance benchmarks show that Solaris 10 provides at least similar performance as Linux does on the same hardware. In one of our production enviroments we have actually migrated from Linux to Solaris and one of the reasons was 10-15% performance gain on Solaris (other reasons: NFS client works on Solarisw while Linux has problems under heavy load with EMC Celerra, resource manager included in Solaris and much better observability on Solaris).

Anonymous said...

- well many performance benchmarks show that Solaris 10 provides at least similar performance as Linux does on the same hardware. In one of our production enviroments we have actually migrated from Linux to Solaris and one of the reasons was 10-15% performance gain on Solaris

Well I dont want to argue but I will take Sun's own words for it. Have you ever checked opensolaris bug database - just search for "Linux" "performance" or "slow" and you will know what I mean. But that's not the only point - People need hardware support without which dtrace is going to be limited in it's use to few Sun customers who bought Sun hardware.
I never disagreed that dtrace is a good thing - I was just venting my frustration over Solaris h/w support and slowness. I would love to get it working decently and reliably on my hardware. Lack of h/w support and generally acceptable performance mean dtrace will never get mainstream use. Sun needs to fix these things first before they declare dtrace as the god. Wrong forum I guess...!