Thursday, January 23, 2014

mkdir() performance

Update: the fix is in Solaris 11.1 + SRU17, and should be in Solaris 11.2 once it is out. It now has a similar optimization to Linux. Network based file systems like AFS or NFS benefit most from it.

Recently I came across an issue where 'make install' on a Solaris server was taking *much* more time than on Linux. Files were being installed into AFS file system. After some debugging I found that GNU install calls mkdir() for all directories for a specified path and relies on EEXIST if a given directory already exists. For example: 

$ truss -D -t mkdir /usr/bin/ginstall -c -d \
 /ms/dev/openafs/core/1.6.5-c3/compile/x86_64.sunos64.5.11/sunx86_511/dest/bin
0.0026 mkdir("/ms", 0755)                              Err#17 EEXIST
0.0003 mkdir("dev", 0755)                              Err#30 EROFS
0.0002 mkdir("openafs", 0755)                          Err#30 EROFS
0.0002 mkdir("core", 0755)                             Err#30 EROFS
0.0083 mkdir("1.6.5-c3", 0755)                         Err#17 EEXIST
3.0085 mkdir("compile", 0755)                          Err#17 EEXIST
3.0089 mkdir("x86_64.sunos64.5.11", 0755)              Err#17 EEXIST
0.0005 mkdir("sunx86_511", 0755)                       Err#17 EEXIST
0.0002 mkdir("dest", 0755)                             Err#17 EEXIST
0.0065 mkdir("bin", 0755)                              Err#17 EEXIST
$

Notice that two mkdir()s took about 3s each! Now if there are lots of directories to be ginstall'ed it will take a very long time... I couldn't reproduce it on Linux though. Actually what happens is that Linux checks on VFS layer if there is a valid dentry with an inode allocated and if there is it will return with EEXIST without calling a file system specific VOP_MKDIR. The relevant code is:

https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/fs/namei.c
…
int vfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
{
        int error = may_create(dir, dentry);
        unsigned max_links = dir->i_sb->s_max_links;

        if (error)
               return error;

        if (!dir->i_op->mkdir)
               return -EPERM;
…
static inline int may_create(struct inode *dir, struct dentry *child)
{
        audit_inode_child(dir, child, AUDIT_TYPE_CHILD_CREATE);
        if (child->d_inode)
               return -EEXIST;
…

Unfortunately, Solaris doesn't have this optimization (though it does optimize couple of other cases, for example for EROFS), so each mkdir() results in VOP_MKDIR being called and for AFS it means sending a request over a network to a file server and waiting for a reply. That alone will make it slower than on Linux, but it still doesn't explain the 3s.

It turned out that AFS file server has a throttling mechanism - if a client is generating requests which results in error then by default it will delay answering to the client after 10 errors. This can be disabled or the threshold can be adjusted though. See -abortthreshold option to file server.

This was also tested (by an Oracle Solaris engineer) over NFS and showed 100x difference in response time. There is a negligible difference for local file systems.

A bug was opened against Solaris to get it fixed - see Bug 18115102 - mkdir(2) calls VOP_MKDIR() even though the directory or file already exists

Hopefully it will get fixed soon.

No comments: