I noticed using iostat that /var is being written to a lot. Probably /var/tmp or some logs - let's check it!
bash-3.00# dtrace -n io:::start'/args[2]->fi_mount == "/var"/{trace(args[2]->fi_pathname);}'
dtrace: description 'io:::start' matched 6 probes
CPU ID FUNCTION:NAME
0 94 bdev_strategy:start /var/tmp/#sql_e91_4.MYD
0 94 bdev_strategy:start /var/tmp/#sql_e91_4.MYD
0 94 bdev_strategy:start /var/tmp/#sql_e91_4.MYD
0 94 bdev_strategy:start /var/tmp/#sql_e91_2.MYD
[...]
^C
Ok, so it's /var/tmp after all. Probably it's good to make /var/tmp a tmpfs filesystem.
Now let's see which instance of MySQL is doing most of the IOs.
bash-3.00# dtrace -n io:::start'{@a[execname]=count();}'
dtrace: description 'io:::start' matched 6 probes
^C
sched 6
mysqld2 7
fsflush 225
mysqld1 244
mysqld3 2993
bash-3.00#
How many bytes are actually transfered during IOs by each instance?
bash-3.00# dtrace -n io:::start'{@a[execname]=sum(args[0]->b_bcount);}'
dtrace: description 'io:::start' matched 6 probes
^C
sched 286208
fsflush 2939904
mysqld2 5169152
mysqld1 21300224
mysqld3 51502592
bash-3.00#
Maybe some IO's of given instance are mostly cached by filesystem so let's see how many bytes are transferred only when physical IO is done.
bash-3.00# dtrace -n io:::start'/args[0]->b_flags & B_PHYS/{@a[execname]=sum(args[0]->b_bcount);}'
dtrace: description 'io:::start' matched 6 probes
^C
sched 165888
mysqld2 397312
fsflush 1048576
mysqld1 6002176
mysqld3 85908480
bash-3.00#
Let's say we want to know which file is being mostly accessed.
bash-3.00# dtrace -n io:::start'{@a[args[2]->fi_pathname]=sum(args[0]->b_bcount);}'
dtrace: description 'io:::start' matched 6 probes
^C
/opt/mysql1/master.info 3072
/opt/mysql2/kawiarenki_dir/MiniCzaty.MYD 3072
/opt/mysql3/ib_logfile0 12288
/opt/mysql2/opteron-slow.log 12288
/opt/mysql2/dzieci2/DAYRATE.MYD 12288
/opt/mysql2/kawiarenki_dir/Pokoje.TMD 12288
/opt/mysql2/hosting/hp_pages.TMD 12288
/opt/mysql1/ib_logfile0 12288
/opt/mysql2/opteron-bin.012 24576
/opt/mysql1/opteron-bin.006 24576
/opt/mysql1/opteron-relay-bin.008 86016
/opt/mysql1/ibdata6 147456
/opt/mysql3/opteron-bin.017 221184
/opt/mysql1/ibdata2 393216
/opt/mysql2/users/users.TMM 643072
/opt/mysql1/ibdata3 753664
/opt/mysql1/ib_logfile1 819200
/opt/mysql1/ibdata5 2457600
/var/tmp/#sql_49fc_0.MYD 3784704
/opt/mysql3/ibdata1 3833856
/opt/mysql1/ibdata4 4030464
/opt/mysql3/ibdata3 4317184
/var/tmp/#sql_49fc_1.MYD 5382144
/opt/mysql2/users/users.TMD 5455872
/opt/mysql3/ibdata5 5603328
/opt/mysql3/ib_logfile2 5664768
/opt/mysql1/ibdata1 8970240
/opt/mysql3/ibdata4 9183232
/opt/mysql3/ibdata2 20111360
bash-3.00#
What you can get freom all of this? Well, first it's probably worth either linking /var/tmp to /tmp or make /var/tmp tmpfs filesystem. You know exactly which instance of MySQL is making most use of disks - so if you have a problem with storage performance you know which instance move to other server or to give it separate storage (probably faster). Then if some subset of files is beeing accessed much more than the rest you can spread these files to separate storage.
And so on... all of this on a production box in a safe manner. All you need is just an imagination of what to ask and DTrace gives you an answer :) (but you've got to know how to ask :))))
Well, that's just the thing, isn't it?
ReplyDeleteYou have to know what to ask Dtrace, but most of us folks outside of Sun don't know what to ask, or how to ask it.
What would really help is if Bryan and Adam wrote an "O'Reilly" book about what to ask and how to ask it.
I won't say that it's always easy. But for sys admins to get such info as above is really simple and basicly little knowledge is needed than basic system understanding. If you know how to use and interpret iostat you should be able understand above without any problems. The only thing is to learn DTrace - which is rather quite simple itself. I mean you can start using it after 30s - just get some examples and start tweaking it yourself. And read documentation which is really good. My adive is to start using it - you'll se yourself in a short time that you can easly answer many questions which were hard to before.
ReplyDeleteI belive this year Sun should publish new Solaris Internals book which should focues on DTrace a lot and should explain how the system works and how to get that information using Solaris tools, DTrace included.
btw: and I'm not working for Sun - I'm outside of Sun as most of you.
AFAIR you can get huge performance premium for MySQL if you keep the database files on filesystem mounted with 'forcedirectio' option.
ReplyDeleteOnce there was a test including some BSD systems, Linux 2.4 and 2.6 and Solaris Express (or the very first version of S10).
Solaris performance in MySQL benchmark was very poor (~15% of Linux), but after using 'forcedirectio' it jumped just very little below linux 2.6 (which was the winner).
I know that - it is the case with InnoDB.
ReplyDeleteHallo and thanks for great blogpost. Please, can you tell me what does it means when after inspecting which file is being mostly accessed (dtrace command mentioned in blogpost) i have most accessed item called "none". The number of accesses for this item is about 12 000 000 per 20 sec. Thank you for any suggestions.
ReplyDeleteJorge
(Solaris 10 64bit, MySQL 5.0)