Friday, March 11, 2022

TrueNAS Scale and ZFS Write Throttling

By default on OpenZFS/Linux the max amount of dirty data is capped at 4GB. This means that if your nas server is connected over network with faster throughput than your pool can sustain (writes), you will quickly end up being throttled when writing lots of data (>>4GB). This can easily happen if your home nas server is connected over 10GbE and its internal pool configuarion can sustain much less (e.g. 2 mirrored disks will sustain less than 200MB/s of writes) and you are writing from time to time large amounts of data.

In such a case you might potentially improve performance significantly by tuning ZFS write throttling.

Example workflow is - you are editing large'ish video files on your workstation (also connected to 10GbE) on your loval NVMEs drives. Once you are done you want to make a copy to your nas server - the files lets say are 50GB in total. You will not be able to write at full network speed for the whole transfer due to write throttling.

If your nas server has plenty of RAM, you could increase the dirty data threshold - if you can increase it to a larger value then the max amount of data you will be writing in one session, then the end result should be that you will sustain a very high throughput over network and from your perspective the whole transfer will be significantly quicker while the nas server will be destaging the data to disks in background.

Let's say you want to increase it to 100GB, to do so:
    # echo 107374182400 >/sys/module/zfs/parameters/zfs_dirty_data_max
There are pros and cons obviously, depending on specifc situation. For example if you have multiple clients writing to your NAS server then increasing the threshold might not be such a good idea. Generally, if you only have one client doing any heave writes from time to time you should benefit from this tuning.

There is an excellent blog on tuning zfs write throttle if you want to understand it in more detail.

How do you make the setting persitent in TrueNAS Scale in a supported way?
Go to System Settings -> Advanced -> Init/Shutdown Scripts -> Add
and populate it with:
  Type: Command
  Command: echo 107374182400 >/sys/module/zfs/parameters/zfs_dirty_data_max
  When: Post Init
After reboot it will pick up the new value.
The advantage of setting it up this way is that it will be included in exported TrueNAS config file.

Wednesday, February 09, 2022

TrueNAS Scale & ZFS Wrapping Key

 I've been playing with TrueNAS scale recently and while the BUI allows you to export/download ZFS wrapping key I wanted to know how to get the key manually. After a quick look at the code I found that the key is stored in sqlite db kept on root file system.

root@truenas[~]# sqlite3 /data/freenas-v1.db
SQLite version 3.34.1 2021-01-20 14:10:07
Enter ".help" for usage hints.
sqlite> select * from storage_encrypteddataset;
1|backup|ioI/B72PEllUJjumWpWHkdhDDCd2l2eopFEJgWYIpcAcTT1v0NyYicjzKiHfuoncL2Mklfa45pUJIyxzGFGobr17b1HtprjSth/X9yyfsnROCK/xQL+SVmO/5fT/KabfSSiz8+IfDH8=|
The key itself is encrypted so you need to decrypt it first before it can be used with ZFS. A simple python script to do it attached below.
root@truenas[~]# ./decode_key.py
dataset: backup
  key: 16f7677b514ef39bc162312274c76da24221ecc5a2f01e6ba0bhfeec054d9162
(both the encrypted and decrypted keys above have been modified for this blog entry)
root@truenas[~]# cat decode_key.py
#!/usr/bin/python3

# based on /usr/lib/migrate113/freenasUI/system/migrations/0022_cloud_sync.py

import sys
import base64
from Cryptodome.Cipher import AES
import sqlite3


PWENC_BLOCK_SIZE = 32
PWENC_FILE_SECRET = '/data/pwenc_secret'
PWENC_PADDING = b'{'


def pwenc_get_secret():
    with open(PWENC_FILE_SECRET, 'rb') as f:
        secret = f.read()
    return secret


def pwenc_decrypt(encrypted=None):
    if not encrypted:
        return ""
    from Cryptodome.Util import Counter
    encrypted = base64.b64decode(encrypted)
    nonce = encrypted[:8]
    encrypted = encrypted[8:]
    cipher = AES.new(
        pwenc_get_secret(),
        AES.MODE_CTR,
        counter=Counter.new(64, prefix=nonce),
    )
    return cipher.decrypt(encrypted).rstrip(PWENC_PADDING).decode('utf8')


if len(sys.argv) == 2:
    print(pwenc_decrypt(sys.argv[1]))
    exit(0)

dbcon = sqlite3.connect('/data/freenas-v1.db')
dbcur = dbcon.cursor()
for row in dbcur.execute('select * from storage_encrypteddataset'):
    ds_id, ds_name, ds_enc_key, kmip_enc_key = row
    #print(ds_id, ds_name, ds_enc_key, pwenc_decrypt(ds_enc_key))
    print(f'dataset: {ds_name}\n  key: {pwenc_decrypt(ds_enc_key)}\n')  
 

Tuesday, May 04, 2021

Linux Tickless and CPU Isolation - microbenchmark

 How much can be gained when it comes to low latency and jitter by isolating CPUs on Linux?

Let's do a microbenchmark by using a tool called hiccups.


A single CPU server (overclocked).
Kernel booted with the following arguments:
    isolcpus=nohz,domain,managed_irq,15-17 irqaffinity=0-13 nohz_full=15-17 intel_idle.max_cstate=0 idle=poll
    nmi_watchdog=0 nowatchdog nosoftlockup intel_pstate=passive

First let's run the hiccups tool on CPU 14 (not isolated, etc.):
    # taskset -c 14 chrt --other 0 ./hiccups -r 60 | column -t -R 1,2,3,4,5,6
    cpu  threshold_ns  hiccups  pct99_ns  pct999_ns  max_ns
     14           120    60077      2578       3219    6932

Now let's run it on CPU 15:
    # taskset -c 14 chrt --other 0 ./hiccups -r 60 | column -t -R 1,2,3,4,5,6
    cpu  threshold_ns  hiccups  pct99_ns  pct999_ns  max_ns
     15           120    14654       248       1177    7651

This is about 10x difference.
It can be optimized even further.

Friday, April 30, 2021

Linux Tickless Kernel & CPU Usage Statistics

  I run into an interesting problem yesterday. After changing some kernel boot parameters to enable tickless kernel, an application which is supposed to essentially "spin" 100% on a CPU was being reported as being mostly idle. This was rather puzzling.

After some debugging and reading docs I've managed to reproduce the issue with a simple test case. The test is reproducible on CentOS 8.3 x86_64.

$ uname -r
4.18.0-240.1.1.el8_3.x86_64

$ cat /proc/cmdline
BOOT_IMAGE=(hd0,msdos2)/vmlinuz-4.18.0-240.1.1.el8_3.x86_64 root=/dev/mapper/sysvg-root ro crashkernel=auto
           resume=/dev/mapper/sysvg-swap rd.lvm.lv=sysvg/root rd.lvm.lv=sysvg/swap biosdevname=1 ipv6.disable=1
           net.ifnames=1 rhgb quiet intel_pstate=passive
           intel_idle.max_cstate=0 idle=poll
           isolcpus=domain,managed_irq,15-17 irqaffinity=0-13 nohz_full=15

Let's run a single threaded tight loop bound to CPU 16 which is 100% user space (notice the CPU 16 is isolated otherwise):
# time taskset -c 16 sh -c "i=1; while [ \$i -lt 10000000 ]; do true; ((i++)); done"

real    0m30.713s
user    0m30.688s
sys     0m0.001s
In another terminal I run mpstat showing CPU 16 usage statistics:
$ mpstat -P 16 1
...
00:55:14     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
00:55:15      16    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

00:55:15     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
00:55:16      16   68.32    0.00    0.00    0.00    0.99    0.00    0.00    0.00    0.00   30.69

00:55:16     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
00:55:17      16  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00

00:55:17     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
00:55:18      16  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00

00:55:18     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
00:55:19      16  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00

00:55:19     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
00:55:20      16  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00

00:55:20     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
00:55:21      16  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
...
00:55:45     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
00:55:46      16  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00

00:55:46     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
00:55:47      16    2.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   98.00

00:55:47     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
00:55:48      16    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
^C
As expected CPU 16 was 100% in USR for about 30s.
So far so good.
 
I then modified kernel boot arguments:
  • added nohz option to isolcpus
  • updated nohz_full variable to include CPUs 15-17 (which includes the CPU 16 we are testing here)
After reboot I repeated the above test.
# cat /proc/cmdline
BOOT_IMAGE=(hd0,msdos2)/vmlinuz-4.18.0-240.1.1.el8_3.x86_64 root=/dev/mapper/sysvg-root ro crashkernel=auto
           resume=/dev/mapper/sysvg-swap rd.lvm.lv=sysvg/root rd.lvm.lv=sysvg/swap biosdevname=1 ipv6.disable=1
           net.ifnames=1 rhgb quiet intel_pstate=passive
           intel_idle.max_cstate=0 idle=poll
           isolcpus=nohz,domain,managed_irq,15-17 irqaffinity=0-13 nohz_full=15-17
# date;time taskset -c 16 sh -c "i=1; while [ \$i -lt 10000000 ]; do true; ((i++)); done"
Fri 30 Apr 01:11:15 BST 2021

real    0m30.849s
user    0m30.849s
sys     0m0.000s
It took about the same time for it to finish (as expected). However mpstat output is now different:
$ mpstat -P 16 1
...
01:11:13     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
01:11:14      16    0.00    0.00    0.00    0.00    0.99    0.00    0.00    0.00    0.00   99.01

01:11:14     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
01:11:15      16   23.94    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   76.06

01:11:15     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
01:11:16      16    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

01:11:16     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
01:11:17      16    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

01:11:17     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
01:11:18      16    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

01:11:18     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
01:11:19      16    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

01:11:19     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
01:11:20      16    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

01:11:20     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
01:11:21      16  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00

01:11:21     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
01:11:22      16  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00

01:11:22     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
01:11:23      16    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

01:11:23     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
01:11:24      16    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

01:11:24     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
01:11:25      16    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

01:11:25     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
01:11:26      16    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

01:11:26     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
01:11:27      16  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00

01:11:27     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
01:11:28      16    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

01:11:28     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
01:11:29      16    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

01:11:29     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
01:11:30      16  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00

01:11:30     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
01:11:31      16  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00

01:11:31     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
01:11:32      16    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

01:11:32     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
01:11:33      16    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

01:11:33     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
01:11:34      16    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

01:11:34     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
01:11:35      16    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

01:11:35     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
01:11:36      16    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

01:11:36     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
01:11:37      16    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

01:11:37     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
01:11:38      16    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

01:11:38     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
01:11:39      16  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00

01:11:39     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
01:11:40      16  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00

01:11:40     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
01:11:41      16    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

01:11:41     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
01:11:42      16    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

01:11:42     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
01:11:43      16  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00

01:11:43     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
01:11:44      16  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00

01:11:44     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
01:11:45      16    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

01:11:45     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
01:11:46      16   76.17    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   23.83

01:11:46     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
01:11:47      16    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

01:11:47     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
01:11:48      16    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

01:11:48     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
01:11:49      16    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

01:11:49     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
01:11:50      16    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
^C

Average:     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
Average:      16   83.36    0.00    0.00    0.00    0.03    0.00    0.00    0.00    0.00   16.62

Notice that most of the time while the above loop was spinning on the CPU, mpstat reported the CPU as idle.

Given the loop finished in about the same time as before it was not taken off CPU and had to constantly spin as before.  Also the real time and usr time are similar as reported by time. This means that mpstat reporting the CPU as being IDLE most of the time can't be true, in fact during the 30s run it had to be spinning 100% in USR.

There is an excelent overview on cpu isolation and tickless kernel by SUSE Labs which sheds some light on what and why is happening here.
Specifically check the Part 2 -> 3.1 Cputime accounting section.

Essentially what seems to be happennig here is that CPU times are only updated when CPU enters or exits user space. This results in a more accurate usage numbers, and more importantly does not require to interrupt an running application, which is the whole point of tickless kernel.

But what if an application does not issue any system calls and ends up spinning 100% in user space?
The CPU stats would never get updated... but they actually do every few seconds.
I'm not entirely sure why - probably there is some mechanism to gather them at some interval or there is an IRQ happening (shouldn't be) or... not that important at the moment.

To further confirm that the cpu usage stats are not being updated due to the application not switching to kernel mode and then back to user mode, let's modify the loop so it now calls some syscalls as well (write()):
# date;time taskset -c 16 sh -c "i=1; while [ \$i -lt 10000000 ]; do echo \$i >/dev/null; ((i++)); done"
...
$ mpstat -P 16 1
...
01:47:34     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
01:47:35      16   81.00    0.00   19.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00

01:47:35     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
01:47:36      16   80.20    0.00   19.80    0.00    0.00    0.00    0.00    0.00    0.00    0.00

01:47:36     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
01:47:37      16   80.81    0.00   19.19    0.00    0.00    0.00    0.00    0.00    0.00    0.00

01:47:37     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
01:47:38      16   80.20    0.00   19.80    0.00    0.00    0.00    0.00    0.00    0.00    0.00

01:47:38     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
01:47:39      16   80.81    0.00   19.19    0.00    0.00    0.00    0.00    0.00    0.00    0.00

01:47:39     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
01:47:40      16   80.20    0.00   19.80    0.00    0.00    0.00    0.00    0.00    0.00    0.00

01:47:40     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
01:47:41      16   80.81    0.00   19.19    0.00    0.00    0.00    0.00    0.00    0.00    0.00
...
As suspected the CPU stats are now constantly updated and mpstat is reporting expected values.
 
Notice that the cpu usage reporting issue is not specific to mpstat and other tools behave in a similar manner as they all get stats from /proc/stat.
 
Another interesting observation is, that if you start tracing such an application by using tools like strace or bpftrace (e.g. bpftrace -e 'profile:hz:99/cpu==16/{@=count();}', etc.) this results in the cpu stats to be constantly updated and the issue "miraculously" goes away...
 
Mystery solved.


btw: in case you wonder how likely it is that an actual real application wouldn't issue any (or very rarely) syscalls - well, with network kernel bypass this definitetly can and is the case with some apps


Update1: if a spinning process is running under SCHED_RR then the cpu accounting issue goes away, but it does happen under SCHED_OTHER and SCHED_FIFO - although less so with the latter.

Tuesday, December 03, 2019

Checking/modifying file permissions on an underlying directory


If you have a file system (nfs, etc.) mounted on-top of a directory and you need to see file permissions, ACLs, etc. of the underlying directory and not the mounted fs then:

Solaris:

# mount -F lofs -o nosub /some/path/ /mnt/fix

Linux:

# mount -B /some/path/ /mnt/fix

If you access /mnt/fix now you won't see any mounted filesystems on top of any directories, just the underlying fs.


I used it in the past and forgot about it, needed it yesterday and found it here.

Monday, October 21, 2019

ONTAP 9.6+: REST API: netapp_ontap: performance of get_collection() vs. get()

NetApp's ONTAP 9.6 introduced new REST API along with a new python module netapp_ontap.


#!/bin/python

import time
import getpass
import netapp_ontap

from netapp_ontap import config
from netapp_ontap.host_connection import HostConnection
from netapp_ontap.resources import QuotaRule

password = getpass.getpass()

config.CONNECTION = HostConnection(host, user, password)

start = time.time()
quota_rules = QuotaRule.get_collection(fields='type')
# quota_rules = QuotaRule.get_collection()
total = 0
for l in quota_rules:
    # l.get(fields='type')
    total = total + 1
end = time.time()
print(total)
print(end-start)

With 471 quota rules on my filer it takes about 4s with get_collection(fields='type') vs. ~40s when calling l.get(fields='type') for each rule being processed. So if you are after all the entries it is more quicker to pass the required fields to get_collection() and not call get() on each returned resource.

I did a little bit of debugging and it seems that each get() results in a new TCP/HTTPS connection being established which is likely the main reason of the much worse performance performance. Also get_collection() gets all 471 results in a single HTTP GET.

There seems to be a bug in regards to re-ussing connections though, as it shouldn't have to establish a new session for each get.

Thursday, June 13, 2019

DTrace: nfsv4 provider and utf8string

The nfsv4 provider provides some structures with component4 type which is defined as:

typedef struct {
 uint_t utf8string_len;
 char *utf8string_val;
} utf8string;

typedef utf8string component4;

So for example, to print NFSv4 file renames you have to do:

nfsv4:::op-rename-start
{
    this->a = (char *)alloca(args[2]->oldname.utf8string_len + 1);
    bcopy(args[2]->oldname.utf8string_val, this->a, args[2]->oldname.utf8string_len);
    this->a[args[2]->oldname.utf8string_len + 1] = '\0';

    this->b = (char *)alloca(args[2]->newname.utf8string_len + 1);
    bcopy(args[2]->newname.utf8string_val, this->b, args[2]->newname.utf8string_len);
    this->b[args[2]->newname.utf8string_len + 1] = '\0';

    printf("NFSv4 rename: %s\n", strjoin(this->a, strjoin(" -> ", this->b)));
}

Ideally DTrace (strjoin(), etc.) should deal with utf8string type automatically.

Linux Load Averages

Linux measures load average differently than other OS'es. In a nutshell it includes both CPU and disk i/o and more. Brendan has an excellent blog entry on this explaining in much more detail how it works.

Friday, May 03, 2019

Testing ZFS/L2ARC

Solaris 11.4 - setting zfs_arc_collect_check=0 via mdb (takes immediate effect) or via /etc/system makes ZFS to start feeding l2arc immediately. Notice that this can negatively impact ARC performance so use it with care. This is useful for testing if you want to push some data to into L2ARC quicker/sooner, especially on large memory systems. The variable is checked by arc_can_collect() function (if it returns 1 then l2arc cab be fed, if zero it can't).

Monday, March 11, 2019

DTrace stop() action

The stop() action in DTrace stops an entire process... well, actually it doesn't. It stops a single thread in a multi-threaded process, which got me surprised as I always thought it did stop an entire process. Now, this is actually very useful, though a stopall() action which would stop all threads could now be useful as well :)

Update: this is getting more complicated now, the way stop() action behaves depends on probe type it is called from. For example, if called from a probe from syscall provider it will just stop a thread which called the syscall, but if called from a probe from PID provider it will stop entire process with all its threads. This is getting confusing...

btw: pstop PID stops entire process while pstop PID/LWPID stops a single thread



Friday, March 01, 2019

DTrace %Y print format with nanoseconds

Small but useful extension to DTrace is now available in Solaris 11.4.SRU6. It allows to easily print current date with an optional nanosecond resolution. It is disabled by default for backward compatibility.

To enable it you need to add timedecimals option to dtrace:


# dtrace -q -x timedecimals -n syscall::open*:entry \
    '{printf("%Y %s called %s()\n", walltimestamp, execname, probefunc);}'
2019 Mar  1 11:50:48.774114445 firefox called openat64()

2019 Mar  1 11:50:49.149290513 dtrace called openat()
2019 Mar  1 11:50:49.149283375 dtrace called openat()
2019 Mar  1 11:50:50.030217373 firefox called openat64()
2019 Mar  1 11:50:49.974253263 firefox called openat64()

2019 Mar  1 11:50:50.114684381 VBoxService called openat()
^C


You can also specify number of decimal places to be printed, fox example:

# dtrace -q -x timedecimals=2 -n syscall::open*:entry \
    '{printf("%Y %s called %s()\n", walltimestamp, execname, probefunc);}'
2019 Mar  1 11:56:51.09 VBoxService called openat()
2019 Mar  1 11:56:51.09 VBoxService called openat()
2019 Mar  1 11:56:51.45 dtrace called openat()
^C



Friday, November 23, 2018

RAID-Z improvements and cloud device support

Solaris 11.4 introduced few new ZFS pool versions with interesting new features or enhancements:

# zpool upgrade -v
...
 38  Xcopy with encryption
 39  Resilver restart enhancements
 40  New deduplication support
 41  Asynchronous dataset destroy
 42  Reguid: ability to change the pool guid
 43  RAID-Z improvements and cloud device support
 44  Device removal
...

The RAID-Z improvements mean that data is written more efficiently - in some cases it can now store more data in a pool than before. But even more importantly the performance (both throughput and IOPS) of RAIDZ is now close to RAID10!

Friday, November 09, 2018

Spectre and Kernel Modules

On Linux one needs to recompile kernel modules to get protection, while on Solaris this is not necessary. Once you are on Solaris 11.4 with Spectre fixes enabled, all kernel modules, even compiled on older Solaris releases, just work and are protected. Nothing special to do there.

Friday, October 12, 2018

bpftrace

Right, finally Linux is getting something similar and useful to DTrace, see bpftrace. However for it to be useful in enterprise it has to be included in RedHat - I wonder how long it will take though... but maybe around 2020 this will finally happen and then  Linux will truly have an equivalent of DTrace, even if 15 years later.