Many years ago I compared performance overhead of dtrace vs. truss on Solaris, in a microbenchmark.
This time I run a similar test but comparing bpftrace vs. strace on Linux.
To do so, I wrote a simple C program which stawns X threads and each thread will call stat("/tmp") N times in a loop. The program then prints the total time it took for all threads to execute.
While it is not necesarilly a very realistic test, it does show a potential overhead of tracing and difference between different technologies. Also sometimes you do need to trace a very tight loop which then might result in overheads like shown below (or even higher).
Let's run it three times (to see if we get consistent results) with 4 threads, each one calling stat() 100k times.
It took just below 1s to execute.
Now, let's run it under bpftrace which will count how many times the stat() was called by all threads.
There is roughly a 20% overhead - not bad.
Adding an extra condition to the predicate str(args->filename)=="/tmp" has little impact - resulting in total times <1.24s.
Again, not bad, especially given that string comparison like this is rather expensive.
Now time for strace.
It took about 50 times longer to execute!
While there have been many improvements to strace to reduce its impact, it is still significant in some cases.
It doesn't mean that strace is a bad tool and you should avoid it - in fact, it is often more handy and quicker to use than bpftrace or systemtap.
However be mindful of its potentially much higher overhead, especially in tight loops.
The source code for the test program.