Friday, January 31, 2014

SMF: svcbundle

One of the missing things in SMF for me is an ability to mark a service to be disabled or enabled after reboot during postinstall phase of OS installation. One way of doing it is to generate a site profile, but this is rather cumbersome to do from a postinstaller. But there is a better way - svcbundle(1M). For example, to generate a profile which would enable ssh after reboot (or after manifest-import service is restarted:
svcbundle -s bundle-type=profile -s service-name=network/ssh \
          -s enabled=true >/etc/svc/profile/site/postinstall-ssh.xml
You can modify any parameter you want, for example to modify a sendmail service to use /etc/mail/cf/cf/nullclient.mc to generate its configuration from each time it starts:
svcbundle -s bundle-type=profile -s service-name=network/smtp -s instance-name=sendmail \
          -s instance-property-config:patht_to_sendmail_mc:astring:/etc/mail/cf/cf/nullclient.mc \
          >/etc/svc/profile/site/postinstall-sendmail-nullclient.xml

Thursday, January 23, 2014

mkdir() performance

Update: the fix is in Solaris 11.1 + SRU17, and should be in Solaris 11.2 once it is out. It now has a similar optimization to Linux. Network based file systems like AFS or NFS benefit most from it.

Recently I came across an issue where 'make install' on a Solaris server was taking *much* more time than on Linux. Files were being installed into AFS file system. After some debugging I found that GNU install calls mkdir() for all directories for a specified path and relies on EEXIST if a given directory already exists. For example: 

$ truss -D -t mkdir /usr/bin/ginstall -c -d \
 /ms/dev/openafs/core/1.6.5-c3/compile/x86_64.sunos64.5.11/sunx86_511/dest/bin
0.0026 mkdir("/ms", 0755)                              Err#17 EEXIST
0.0003 mkdir("dev", 0755)                              Err#30 EROFS
0.0002 mkdir("openafs", 0755)                          Err#30 EROFS
0.0002 mkdir("core", 0755)                             Err#30 EROFS
0.0083 mkdir("1.6.5-c3", 0755)                         Err#17 EEXIST
3.0085 mkdir("compile", 0755)                          Err#17 EEXIST
3.0089 mkdir("x86_64.sunos64.5.11", 0755)              Err#17 EEXIST
0.0005 mkdir("sunx86_511", 0755)                       Err#17 EEXIST
0.0002 mkdir("dest", 0755)                             Err#17 EEXIST
0.0065 mkdir("bin", 0755)                              Err#17 EEXIST
$

Notice that two mkdir()s took about 3s each! Now if there are lots of directories to be ginstall'ed it will take a very long time... I couldn't reproduce it on Linux though. Actually what happens is that Linux checks on VFS layer if there is a valid dentry with an inode allocated and if there is it will return with EEXIST without calling a file system specific VOP_MKDIR. The relevant code is:

https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/fs/namei.c
…
int vfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
{
        int error = may_create(dir, dentry);
        unsigned max_links = dir->i_sb->s_max_links;

        if (error)
               return error;

        if (!dir->i_op->mkdir)
               return -EPERM;
…
static inline int may_create(struct inode *dir, struct dentry *child)
{
        audit_inode_child(dir, child, AUDIT_TYPE_CHILD_CREATE);
        if (child->d_inode)
               return -EEXIST;
…

Unfortunately, Solaris doesn't have this optimization (though it does optimize couple of other cases, for example for EROFS), so each mkdir() results in VOP_MKDIR being called and for AFS it means sending a request over a network to a file server and waiting for a reply. That alone will make it slower than on Linux, but it still doesn't explain the 3s.

It turned out that AFS file server has a throttling mechanism - if a client is generating requests which results in error then by default it will delay answering to the client after 10 errors. This can be disabled or the threshold can be adjusted though. See -abortthreshold option to file server.

This was also tested (by an Oracle Solaris engineer) over NFS and showed 100x difference in response time. There is a negligible difference for local file systems.

A bug was opened against Solaris to get it fixed - see Bug 18115102 - mkdir(2) calls VOP_MKDIR() even though the directory or file already exists

Hopefully it will get fixed soon.

Tuesday, January 21, 2014

Systemtap

I had to measure what's going on in kernel for a specific VFS call both on Linux and on Solaris. On Solaris it was quick and easy - got an answer in couple of minutes, on Linux... Systemtap didn't like version of kernel-debuginfo installed (the error message wasn't that obvious either), had to uninstall it and install the proper one, as one can't have multiple versions installed which would be useful with multiple kernels installed...

While I like Systemtap in principle, in practice every time I use it I find it very frustrating, especially if I have to measure something similar on Solaris with DTrace which usually is so much easier and quicker to do.

Then, we do not use Systemtap on prod servers... in minutes I unintentionally crashed a Linux box by tracing with Systemtap :(

Now, Oracle Linux has DTrace... :)

Friday, January 17, 2014

AFS Client - better observability

Wouldn't it be nice if fsstat(1M) and fsinfo::: DTrace provider were working with AFS client? It turned out this was trivial to implement - Solaris itself will provide necessary stats for fsstat and register fsinfo::: probes if one extra flag is specified.Thanks to Andrew Deason for implementing it in AFS, and big thanks to Oracle's engineers for providing information on how to do it. See the actual patch to AFS.

Let's see fsstat in action against an AFS mountpount:
$ fsstat /ms 1
new  name   name  attr  attr lookup rddir  read read  write write
file remov  chng   get   set    ops   ops   ops bytes   ops bytes
    0     0     0 10.5K     0   173K   939 20.7K 72.8M     0     0 /ms
    0     0     0     0     0      0     0     0     0     0     0 /ms
    0     0     0     0     0      0     0     0     0     0     0 /ms
    0     0     0    26     0    296     0     9 8.06K     0     0 /ms
    0     0     0     0     0      0     0     0     0     0     0 /ms
    0     0     0     0     0      0     0     0     0     0     0 /ms
    0     0     0     0     0      0     0     0     0     0     0 /ms
    0     0     0   170     0  2.36K    10   106  265K     0     0 /ms
    0     0     0   159     0  1.97K     2    90  261K     0     0 /ms
    0     0     0    25     0    331     0 1.05K 4.15M     0     0 /ms
    0     0     0   138     0  1.80K     0 1.48K 5.37M     0     0 /ms
    0     0     0   360     0  4.72K    12 1.42K 5.35M     0     0 /ms
    0     0     0   122     0  1.63K     0 1.30K 4.68M     0     0 /ms
Now DTrace fsinfo::: provider showing results for AFS related operations:
$ dtrace -q -n fsinfo:::'/args[0]->fi_fs == "afs"/ \
               {printf("%Y %s[%d] %s %s\n", walltimestamp, execname, pid, probename, args[0]->fi_pathname);}'

2014 Jan 16 16:49:07 ifstat[1485] getpage /ms/dist/perl5/PROJ/core/5.8.8-2/.exec/ia32.sunos.5.10/bin/perl
2014 Jan 16 16:49:07 ifstat[1485] lookup /ms/dist
2014 Jan 16 16:49:07 ifstat[1485] lookup /ms/dist/aurora
2014 Jan 16 16:49:07 ifstat[1485] inactive /ms/dist
2014 Jan 16 16:49:07 ifstat[1485] lookup /ms/dist/aurora/bin
2014 Jan 16 16:49:07 ifstat[1485] inactive /ms/dist/aurora
2014 Jan 16 16:49:07 ifstat[1485] inactive /ms/dist/aurora/bin
2014 Jan 16 16:49:07 ifstat[1485] lookup /ms/dist
2014 Jan 16 16:49:07 ifstat[1485] lookup /ms/dist/fsf
2014 Jan 16 16:49:07 ifstat[1485] inactive /ms/dist
2014 Jan 16 16:49:07 ifstat[1485] lookup /ms/dist/fsf/bin
2014 Jan 16 16:49:07 ifstat[1485] inactive /ms/dist/fsf
2014 Jan 16 16:49:07 ifstat[1485] inactive /ms/dist/fsf/bin
2014 Jan 16 16:49:07 ifstat[1485] delmap /ms/dist/perl5/PROJ/core/5.8.8-2/.exec/ia32.sunos.5.10/bin/perl
2014 Jan 16 16:49:07 ifstat[1485] delmap /ms/dist/perl5/PROJ/core/5.8.8-2/.exec/ia32.sunos.5.10/bin/perl
2014 Jan 16 16:49:07 ifstat[1485] delmap /ms/dist/perl5/PROJ/core/5.8.8-2/.exec/ia32.sunos.5.10/lib/perl5/auto/Time/HiRes/HiRes.so
2014 Jan 16 16:49:07 ifstat[1485] delmap /ms/dist/perl5/PROJ/core/5.8.8-2/.exec/ia32.sunos.5.10/lib/perl5/auto/Time/HiRes/HiRes.so
2014 Jan 16 16:49:07 ifstat[1485] close /ms/dist/perl5/PROJ/core/5.8.8-2/.exec/ia32.sunos.5.10/bin/perl
2014 Jan 16 16:49:07 ifstat[964] open /ms/dist/perl5/PROJ/core/5.8.8-2/.exec/ia32.sunos.5.10/bin/perl
2014 Jan 16 16:49:07 ifstat[964] addmap /ms/dist/perl5/PROJ/core/5.8.8-2/.exec/ia32.sunos.5.10/bin/perl
2014 Jan 16 16:49:07 tcpstat[965] open /ms/dist/perl5/PROJ/core/5.8.8-2/.exec/ia32.sunos.5.10/bin/perl
2014 Jan 16 16:49:07 tcpstat[965] addmap /ms/dist/perl5/PROJ/core/5.8.8-2/.exec/ia32.sunos.5.10/bin/perl
2014 Jan 16 16:49:07 tcpstat[965] addmap /ms/dist/perl5/PROJ/core/5.8.8-2/.exec/ia32.sunos.5.10/bin/perl
2014 Jan 16 16:49:07 tcpstat[965] addmap /ms/dist/perl5/PROJ/core/5.8.8-2/.exec/ia32.sunos.5.10/lib/perl5/auto/Time/HiRes/HiRes.so
2014 Jan 16 16:49:07 tcpstat[965] addmap /ms/dist/perl5/PROJ/core/5.8.8-2/.exec/ia32.sunos.5.10/lib/perl5/auto/Time/HiRes/HiRes.so
2014 Jan 16 16:49:07 tcpstat[1484] getpage /ms/dist/perl5/PROJ/core/5.8.8-2/.exec/ia32.sunos.5.10/bin/perl