Monday, April 06, 2009

truss(1M) vs. dtrace(1M)

One of the many benefits of DTrace vs. truss is that dtrace should induce much smaller overhead for tracing applications especially for multi-threaded applications running on multi core/cpu servers. Lets put it to a quick test.

I quickly wrote a small C program which spawns N threads and each thread does stat("/tmp") X times. Then I measured how much time it takes to execute it for 1mln stat()'s in total while running with no tracing at all, running under truss and running under dtrace.


One two-core AMD CPU
# ptime ./threads-2 1 1000000

real 2.662809885
user 0.223471401
sys 2.435895135

# ptime ./threads-2 2 500000

real 1.649542016
user 0.226104849
sys 3.045784378

# ptime truss -t xstat -c ./threads-2 2 500000

syscall seconds calls errors
xstat 6.966 1000000
stat64 .000 3 1
-------- ------ ----
sys totals: 6.966 1000003 1
usr time: .776
elapsed: 18.520

real 18.533000528
user 5.677239771
sys 16.069020190

# dtrace -n 'syscall::xstat:entry{@=count();}' -c 'ptime ./threads-2 2 500000'
dtrace: description 'syscall::xstat:entry' matched 1 probe

real 1.888294217
user 0.225676973
sys 3.506004575
dtrace: pid 8526 has exited

1000000

truss made the program to execute about 11x longer while dtrace made program to execute for about 14% longer.


Niagara server:

# ptime ./threads-2 1 1000000

real 10.873
user 1.881
sys 8.992

# ptime ./threads-2 10 100000

real 1.467
user 1.962
sys 12.121

# ptime truss -t xstat -c ./threads-2 1 1000000

syscall seconds calls errors
stat 26.958 1000004 1
-------- ------ ----
sys totals: 26.958 1000004 1
usr time: 2.758
elapsed: 214.600

real 3:34.613
user 30.900
sys 2:28.182

# ptime truss -t xstat -c ./threads-2 10 100000

syscall seconds calls errors
stat 37.259 1000004 1
-------- ------ ----
sys totals: 37.259 1000004 1
usr time: 3.178
elapsed: 168.010

real 2:48.063
user 1:05.709
sys 3:35.813

# dtrace -n 'syscall::stat:entry{@=count();}' -c 'ptime ./threads-2 1 1000000'
dtrace: description 'syscall::stat:entry' matched 1 probe

real 14.028
user 1.957
sys 12.069
dtrace: pid 12920 has exited

1000939

# dtrace -n 'syscall::stat:entry{@=count();}' -c 'ptime ./threads-2 10 100000'
dtrace: description 'syscall::stat:entry' matched 1 probe

real 1.858
user 2.142
sys 15.632
dtrace: pid 11679 has exited

1000083

truss made the program to execute about 20x longer in the single thread case and 115x longer for the multi threaded one while dtrace added no more than 30% to the execution time regardless if the application was running with one or many executing threads. This shows that one has to be especially careful when using truss on a multi CPU/core system on a multi-threaded application. Notice that the performance difference between multi-threaded and single-threaded example for truss shows not that much difference comparing to execution times with no tracing at all which shows the ugly feature of truss - it serializes a multi-threaded application.

Of course the benchmark is the worst-case scenario and in real life you shouldn't get that much overhead from both tools. Still truss in some cases could introduce too much overhead on a production server while dtrace would still be perfectly acceptable allowing you to continue with your investigation.

btw: DTraceToolkit provides a script called dtruss - it's a tool similar to truss but it is using DTrace.



cat threads-2.c


#include <thread.h>
#include <stdlib.h>
#include <pthread.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>

void *thread_func(void *arg)
{
int *N=arg;
int i;
struct stat buf;

for (i=0; i<*N; i++)
stat("/tmp", &buf);

return(0);
}

int main(int argc, char **argv)
{
int N, iter;
int i;
int rc;
pthread_t tid[255];

if (argc != 3)
{
printf("%s number_of_threads number_of_iterations_per_thread\n", argv[0]);
exit(1);
}

N = atoi(argv[1]);
iter = atoi(argv[2]);

for (i=0; i<N; i++)
{
if (rc = pthread_create(&tid[i], NULL, thread_func, &iter))
printf("Thread #%d creation failed [%d]\n", i, rc);
}


/* wait for all threads to complete */
for (i=0; i<N; i++)
pthread_join(tid[i], NULL);

exit(0);
}

2 comments:

Moinak said...

Very interesting stuff. Good to know. However truss is still a Unix's developer's must-know tool for a variety of common usages.
I remember once during a debug session on a troubled box DTrace won't work and was failing with some weird error's. Eventually I trussed dtrace to find out what the problem was :) It turned out that the /dev/dtrace/* symlink entries were missing!

milek said...

I agree that truss is a must know. However it is good to know its limitations and possible impact on applications.