Wednesday, April 09, 2014

Slow process core dump

Solaris limits rate at which a process core dump is generated. There are two tunables with default values: core_delay_usec=10000 core_chunk=32 (in pages). This essentially means that process core dump is limited to 100 writes of 128KB per second (32x 4k pages on x86) which is an eqivalen of 12.5MB/s - for large processes this will take quite a long time.

I set core_chunk=128 and core_delay_usec=0 via mdb which increased the core dump write rate from 12.5MB/s to about 300-600MB/s.

Long core dump generations can cause issues like delay in restarting a service in SMF. But then making it too fast might overwhelm OS and impact other running processes there. Still, I think the defaults are probably too conservative.

update: I just noticed that this has been fixed in Illumos, see https://illumos.org/issues/3673
 

Thursday, April 03, 2014

Home NAS - Data Corruption

I value my personal pictures so I store them on a home NAS which is running Solaris/ZFS. The server has two data disks mirrored. It's been running perfectly fine for the last two years but recently one of the disk drives returned corrupted data during ZFS scrub. If it wasn't for ZFS I probably wouldn't even know that some pixels are wrong... or perhaps that some pictures are missing... This is exactly why I've used ZFS on my home NAS.

# zpool status vault
  pool: vault
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
  scan: scrub repaired 256K in 3h54m with 0 errors on Tue Mar 11 17:34:22 2014
config:

        NAME        STATE     READ WRITE CKSUM
        vault       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            c4t1d0  ONLINE       0     0     2
            c5t0d0  ONLINE       0     0     0

errors: No known data errors