Tuesday, November 24, 2009

Long ssh logins

On a couple of our servers running Solaris we noticed that it usually takes more than 10s to login. Once in everything is a snap. I quickly investigated it and this turned out to be interesting. I used truss(1M) to investigate what's going on from the moment I connect to the moment I have a working shell.
# truss -f -o /tmp/a -v all -adDE -p 408
Now I logged in to the system and analyzed /tmp/a file. First I confirmed that it took over 10s to login. From the moment the connection was accepted to the moment I got interactive session it took about 11s as shown below:
[...]
408: 2.6594 0.0007 0.0000 fcntl(4, F_SETFL, (no flags)) = 0
[...]
12186: 14.0814 0.0001 0.0000 write(4, " | : { b7F S LB7A2 BA13".., 64) = 64
12196: read(0, 0x080473DF, 1) (sleeping...)
[...]
So I checked when it started to go wrong.
[...]
408: 2.6594 0.0007 0.0000 fcntl(4, F_SETFL, (no flags)) = 0
[...]
12196: 3.7245 0.0003 0.0003 forkx(0) = 12200
The connection started just before the fcntl showed above and everything is executing quick up-to forkx() at 3.7245s. So far it took a little more than 1s. What happens next seems to be a loop of hundreds of entries like:
[...]
12200: 4.5521 0.0000 0.0000 ioctl(3, ZFS_IOC_USERSPACE_ONE, 0x08046790) Err#48 ENOTSUP
12200: 4.5522 0.0001 0.0000 ioctl(7, MNTIOC_GETMNTENT, 0x08047C1C) = 0
12200: 4.5917 0.0395 0.0002 ioctl(3, ZFS_IOC_OBJSET_STATS, 0x08046390) = 0
12200: 4.5918 0.0001 0.0000 getuid() = 35148 [35148]
12200: 4.5919 0.0001 0.0000 getuid() = 35148 [35148]
12200: 4.5919 0.0000 0.0000 door_info(6, 0x08046460) = 0
12200: target=189 proc=0x806FCD0 data=0xDEADBEED
12200: attributes=DOOR_UNREF|DOOR_NO_CANCEL
12200: uniquifier=289
12200: 4.5922 0.0003 0.0000 door_call(6, 0x080464D0) = 0
12200: data_ptr=FE430000 data_size=232
12200: desc_ptr=0x0 desc_num=0
12200: rbuf=0xFE430000 rsize=16384
12200: 4.5923 0.0001 0.0000 ioctl(3, ZFS_IOC_USERSPACE_ONE, 0x08046790) Err#48 ENOTSUP
12200: 4.5923 0.0000 0.0000 ioctl(7, MNTIOC_GETMNTENT, 0x08047C1C) = 0
12200: 4.6095 0.0172 0.0001 ioctl(3, ZFS_IOC_OBJSET_STATS, 0x08046390) = 0
12200: 4.6096 0.0001 0.0000 getuid() = 35148 [35148]
12200: 4.6096 0.0000 0.0000 getuid() = 35148 [35148]
12200: 4.6097 0.0001 0.0000 door_info(6, 0x08046460) = 0
12200: target=189 proc=0x806FCD0 data=0xDEADBEED
12200: attributes=DOOR_UNREF|DOOR_NO_CANCEL
12200: uniquifier=289
12200: 4.6098 0.0001 0.0000 door_call(6, 0x080464D0) = 0
12200: data_ptr=FE430000 data_size=232
12200: desc_ptr=0x0 desc_num=0
12200: rbuf=0xFE430000 rsize=16384
12200: 4.6098 0.0000 0.0000 ioctl(3, ZFS_IOC_USERSPACE_ONE, 0x08046790) Err#48 ENOTSUP
12200: 4.6099 0.0001 0.0000 ioctl(7, MNTIOC_GETMNTENT, 0x08047C1C) = 0
12200: 4.6201 0.0102 0.0001 ioctl(3, ZFS_IOC_OBJSET_STATS, 0x08046390) = 0
12200: 4.6202 0.0001 0.0000 getuid() = 35148 [35148]
12200: 4.6203 0.0001 0.0000 getuid() = 35148 [35148]
[...]
The process with PID 12200 was:
12200:   3.8229  0.0947  0.0013 execve("/usr/sbin/quota", 0x0811F9E8, 0x0811D008)  argc = 1
12200: *** SUID: ruid/euid/suid = 35148 / 0 / 0 ***
12200: argv: /usr/sbin/quota
By visually looking at couple of pages of these ioctls it looked like most of the total time would be spent in doing ioctl(3, ZFS_IOC_OBJSET_STATS, 0x08046390). Lets check it:
# grep "^12200:" /tmp/a |grep ioctl|grep ZFS_IOC_OBJSET_STATS|awk 'BEGIN{i=0}{i=i+$3}END{print i}'
9.7412
So out of 11s above ioctls along took 9. To have a clear picture lets check how much time the quota command took:
# grep "^12200:" /tmp/a |head -1
12200: 3.7245 3.7245 0.0000 forkx() (returning as child ...) = 12196
# grep "^12200:" /tmp/a |tail -1
12200: 13.9854 0.0003 0.0000 _exit(0)
So it took about 10s which means almost 100% of its time was spent doing above ioctls.
There are almost 300 zfs filesystems on this particular server so it all adds up. Sometimes quota completes very quickly sometimes it takes many seconds - I guess depending if requested data from all these zfs filesystems is cached or not. You need to run quota as a non-root user otherwise most checks are skipped and it is always quick.
Since we are not using quota on these systems anyway I commented out quota check in /etc/profile and now a full login takes about 1s on average which is 10-12x improvement.

Wednesday, November 11, 2009

VirtualBox 3.1.0 Beta 1 released

VirtualBox 3.1.0 Beta 1 released.

I'm especially interested in "support for OpenSolaris Boomer architecture" which hopefully means a working microphone in a guest on my laptop - that would mean a working Skype on VB/Windows :)

Version 3.1 will be a major update. The following major new features were added:
  • Teleportation (aka live migration); migrate a live VM session from one machine to another
  • VM states can now be restored from arbitrary snapshots instead of only the last one, and new snapshots can be taken from other snapshots as well (aka branched snapshots)
  • 2D video acceleration for Windows guests; use the host video hardware for overlay stretching and colour conversion
  • The network attachment type can be changed while a VM is running
  • Experimental USB support for OpenSolaris hosts making use of the latest USB enhancements in Solaris Nevada 124 and higher.
  • Significant performance improvements for PAE and AMD64 guests (VT-x and AMD-V only; normal (non-nested) paging)
  • Experimental support for EFI (Extended Firmware Interface)
  • VirtIO network device support

In addition, the following items were fixed and/or added:
  • VMM: reduced IO-APIC overhead for 32 bits Windows NT/2000/XP/2003 guests; requires 64 bits support (VT-x only; bug #4392)
  • VMM: fixed double timer interrupt delivery on old Linux kernels using IO-APIC (caused guest time to run at double speed; bug #3135)
  • VMM: reinit VT-x and AMD-V after host suspend or hibernate; some BIOSes forget this (Windows hosts only; bug #5421)
  • GUI: prevent starting a VM with a single mouse click (bug #2676)
  • 3D support: major performance improvement in VBO processing
  • 3D support: added GL_EXT_framebuffer_object, GL_EXT_compiled_vertex_array support
  • 3D support: fix crashes in FarCry, SecondLife, Call of Duty, Unreal Tournament, Eve Online (bugs #2801, #2791)
  • 3D support: fix graphics corruption in World of Warcraft (#2816)
  • iSCSI: support iSCSI targets with more than 2TiB capacity
  • VRDP: fixed occasional VRDP server crash (bug #5424)
  • Network: fixed the E1000 emulation for QNX (and probably other) guests (bug #3206)
  • Network: even if the virtual network cable was disconnected, some guests were able to send / receive packets (E1000; bug #5366)
  • Network: even if the virtual network cable was disconnected, the PCNet card received some spurious packets which might confuse the guest (bug #4496)
  • VMDK: fixed handling of split image variants
  • VHD: fixed incompatibility with Hyper-V
  • OVF: create manifest files on export and verify the content of an optional manifest file on import
  • X11 based hosts: allow the user to specify their own scan code layout (bug #2302)
  • Mac OS X hosts: don't auto show the menu and dock in fullscreen (#bug 4866)
  • Solaris hosts: combined the kernel interface package into the VirtualBox main package
  • Solaris hosts: support for OpenSolaris Boomer architecture (with OSS audio backend).
  • Shared folders: fixed changing case of file names (bug #2520)
  • Shared folders: VBOXSVR is visible in Network folder (bug #4842)
  • Windows and Linux Additions: added balloon tip notifier if VirtualBox host version was updated and Additions are out of date
  • Solaris Additions: fixed as_pagelock() failed errors affecting guest properties (bug #5337)
  • Windows Additions: added automatic logon support for Windows Vista and Windows 7
  • Windows Additions: fix crash in seamless mode (contributed by Huihong Luo)
  • Linux Additions: added support for uninstalling the Linux Guest Additions (bug #4039)
  • SDK: added object-oriented web service bindings for PHP5

Tuesday, November 10, 2009

ZFS send dedup

PSARC/2009/557 ZFS send dedup was integrated yesterday. It allows to dedup zfs send|recv stream regardless if dedup is enabled or not on a sending and/or receiving side. Looks like it both pool and zfs send stream deduplication will be available in snv_128. Of course you can pull the sources and build them yourself if you want to start playing with dedupe now otherwise you will have to wait to the beginning of December when snv_128 should hit /dev Open Solaris repository.

EU officially objects to Sunacle deal

This is ridiculous and sad... I wonder what would happen now.
The official statement from Oracle.

Thursday, November 05, 2009

No need for fsck in ZFS

Recently there was an article at OSNEWS " Should ZFS Have a fsck Tool?". Well, no it shouldn't. I wanted to write an explanation why it is the case but Joerg was first and there is no point me repeating him. So if you wonder why ZFS doesn't need a fsck tool read Joerg blog entry about it.

Monday, November 02, 2009

ZFS Deduplication Integrated!

It took more than expected but it has been finally integrated! Read Jeff Bonwick's post on ZFS dedup.
PSARC 2009/571 ZFS Deduplication Properties
6677093 zfs should have dedup capability
You can find code changes here.