tag:blogger.com,1999:blog-99634852024-03-16T02:12:43.481+01:00Robert Milkowski's blogmilekhttp://www.blogger.com/profile/13649951503953000496noreply@blogger.comBlogger740125tag:blogger.com,1999:blog-9963485.post-4828604181297424512023-07-19T17:24:00.000+02:002023-07-19T17:24:48.610+02:00LLAMA 2 - hello world<p><span style="font-family: courier;"> $ torchrun --nproc_per_node 1 test2_text_completion.py \</span></p><p><span style="font-family: courier;"> --tokenizer_path tokenizer.model --max_seq_len 128 --max_batch_size 4 --ckpt_dir llama-2-7b/ \</span></p><p><span style="font-family: courier;"> --prompt "Hello everyone, I'm LLAMA-2"</span></p><p><span style="font-family: courier;">...</span></p><p><span style="font-family: courier;">Loaded in 4.32 seconds</span></p><p><span style="font-family: courier;">Hello everyone, I'm LLAMA-2 000. I'm a new member here. I'm from China. I'm a girl, I'm 16. I like animals, especially lamas. I like to play games, especially the games on the internet. I like to make friends. I hope I can make</span></p>Robert Milkowskihttp://www.blogger.com/profile/13188276898947228975noreply@blogger.com0tag:blogger.com,1999:blog-9963485.post-22597447771914121792023-06-20T10:02:00.001+02:002023-06-20T10:02:38.885+02:00Azure/DALLE-2<p>The latest version (0.27.8) of the <a href="https://pypi.org/project/openai/" target="_blank">pypi/openai</a> library now supports <a href="https://learn.microsoft.com/en-us/azure/cognitive-services/openai/dall-e-quickstart?pivots=programming-language-studio" target="_blank">Azure/DALLE-2</a>. To ensure compatibility, set <span style="font-family: courier;">openai.api_version = '2023-06-01-preview'</span>.</p>Robert Milkowskihttp://www.blogger.com/profile/13188276898947228975noreply@blogger.com0tag:blogger.com,1999:blog-9963485.post-37489111735721506442023-03-28T21:57:00.002+02:002023-03-28T21:58:18.930+02:00BPF & Unix timestampRecently I've been using bpftrace to trace some events across servers and it would be really useful if bpftrace allowed to use actual unix timestamps. The nsecs variable won't work as it represents time since boot, while time() and strftime() return strings. <div><br /></div><div>In this specific case, an application running on server A was injecting timestamps into packets being sent to an app on server B where I run tracing. This allowed for tracing time it takes to send a packet over network between apps. Unfortunately bpftrace did not make it easy to do so...</div><div><br /></div><div>This only makes sense if time across servers is synchronised with accuracy much better that the time deltas you need to measure. In our case, servers use PTP and are generally synchronised to GM <<100ns, while the time differences measured here were in many microseconds.</div><div><br /></div><div>I'm not the only one asking for actual timestamps in bpf. For example, see <a href="https://github.com/iovisor/bcc/issues/2067" target="_blank">here</a> and <a href="https://lore.kernel.org/netdev/CAEf4Bzb9KA=mzYo_x42ExRoZjm=dF6up1DxrUL_eqkDYs9+UUg@mail.gmail.com/T/" target="_blank">here</a>. The latter is a discussion about implementing a bpf function to provide epoch timestamp. Unfortunately, there was some resistance in doing so (I don't agree with the arguments), but at least there was a workaround suggested. It is unnecessarily complicated and clanky, but works.</div><div><br /></div><div><pre style="background: repeat rgb(0, 0, 0); caret-color: rgb(204, 204, 204); color: #cccccc; font-size: 13px; white-space: pre-wrap;"><span class="q" style="background-attachment: scroll; background-clip: border-box; background-image: none; background-origin: padding-box; background-position: 0%; background-repeat: repeat; background-size: auto; color: #0099ff;">> Not sure what problem you're trying to solve and thus what exactly you
> need... but you can probably get something very very close with 1 or 2
> of clock_gettime(CLOCK_{BOOTTIME,MONOTONIC,REALTIME}) possibly in a
> triple vdso call sandwich and iterated a few times and picking the one
> with smallest delta between 1st and 3rd calls. And then assuming the
> avg of 1st and 3rd matches the 2nd.
> ie.
>
> 5 times do:
> t1[i] = clock_gettime(REALTIME)
> t2[i] = clock_gettime(BOOTTIME)
> t3[i] = clock_gettime(REALTIME)
>
> pick i so t3[i] - t1[i] is smallest
>
> t2[i] is near equivalent to (t1[i] + t3[i]) / 2
>
> which basically gives you a REAL to BOOT offset.
</span></pre>Let's quickly implement it:</div><div><div><div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background-color: #f4f5f7; background: repeat rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 70em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;">#include <time.h>
#include <stdio.h>
#include <stdint.h>
#define ITERATIONS 5
int main(int argc, char **argv) {
struct timespec ts1[ITERATIONS], ts2[ITERATIONS], ts3[ITERATIONS];
uint64_t t1, t2, t3, t4, t5, smallest_dt = 0;
int ret, i, smallest_dt_i;
for (i = 0; i < ITERATIONS; i++) {
ret = clock_gettime(CLOCK_REALTIME, &ts1[i]);
ret = clock_gettime(CLOCK_BOOTTIME, &ts2[i]);
ret = clock_gettime(CLOCK_REALTIME, &ts3[i]);
}
for (i = 0; i < ITERATIONS; i++) {
t1 = ts1[i].tv_sec * (uint64_t)1000000000 + ts1[i].tv_nsec;
t3 = ts3[i].tv_sec * (uint64_t)1000000000 + ts3[i].tv_nsec;
printf("i: %d dt: %lu\n", i, t3-t1);
if (!smallest_dt || ((t3 - t1) < smallest_dt)) {
smallest_dt = t3 - t1;
smallest_dt_i = i;
}
}
t1 = ts1[smallest_dt_i].tv_sec * (uint64_t)1000000000 + ts1[smallest_dt_i].tv_nsec;
t2 = ts2[smallest_dt_i].tv_sec * (uint64_t)1000000000 + ts2[smallest_dt_i].tv_nsec;
t3 = ts3[smallest_dt_i].tv_sec * (uint64_t)1000000000 + ts3[smallest_dt_i].tv_nsec;
t4 = (uint64_t)(t1+t3)/2;
t5 = t4 - t2;
printf("\n");
printf("t5=t4-t2: %lu\n", t5);
}</pre></div></div></div></div><div><br /></div>Now on a system where you need actual timestamp in bpftrace you do: <span style="font-family: courier;">$t_now = (uint64)NNN + nsecs;</span><div>where NNN is the t5 timestamp reported for the system.</div><div><br /></div><div>I wish one could get it directly in bpftrace, but we are where we are...</div><div><br /></div>Robert Milkowskihttp://www.blogger.com/profile/13188276898947228975noreply@blogger.com0tag:blogger.com,1999:blog-9963485.post-68260746566687924312023-02-21T00:07:00.001+01:002023-02-21T00:13:06.951+01:00Siri + ChatGPTAlthough I appreciate Siri's ability to automate my home, there are times when it can be frustrating, such as when it replies with the annoying message "I found some web results, I sent them to your iPhone." Let's make Siri much more useful and fun by integrating Siri with OpenAI/ChatGPT via Shortcuts app. It even works with HomePod and Apple Watch.<div><br /></div><div>To get started, we need to define a variable with our OpenAPI token, and then use the Dictate Text action to convert speech to text and store the result in another variable.
<div class="separator" style="clear: both;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhGRwu44cWttvWEsUsSuX13-4OzeQHuv-nzOkIv7uR3xxSY7hz9jlMgQBz7YGbcHcukUeG7WjbN7XwqmgEBowGkNNUVBm9OJF6mAZ8fo65GDMR4mSU5cgb0W2g5-_wQxnrHpAN2fjCjH3u5w1hbjCo52QXQ5Y2RgK7RVr-ZD8G8yJ5mPQOHjKQ/s1506/DA76F81B-DED5-420C-B285-4769F8C2A00D.jpeg" style="display: block; padding: 1em 0px; text-align: center;"><img alt="" border="0" data-original-height="1506" data-original-width="1290" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhGRwu44cWttvWEsUsSuX13-4OzeQHuv-nzOkIv7uR3xxSY7hz9jlMgQBz7YGbcHcukUeG7WjbN7XwqmgEBowGkNNUVBm9OJF6mAZ8fo65GDMR4mSU5cgb0W2g5-_wQxnrHpAN2fjCjH3u5w1hbjCo52QXQ5Y2RgK7RVr-ZD8G8yJ5mPQOHjKQ/s400/DA76F81B-DED5-420C-B285-4769F8C2A00D.jpeg" /></a></div>
Next, let's set another variable with parameters for OpenAI query:
<div class="separator" style="clear: both;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiSP1Cvjft0dRi417hoHiJl_lrns29rU8oAMYG8J1KeaFBUAsMhCUtsw9BdzLq9FwJSrcYNQRPDYfQo6Z6Wy7EURexAambEJR1JxJwbr9U_QJC5eWeC3RZU5WtcplNZpmbM9Mg96wQuAXrxFl6JM7Lv_gglHz5S64EE3DMzkQeGqbU-Sr2pX7o/s1290/54905FF1-C953-4147-B6B8-B8D3C0088BC8.jpeg" style="display: block; padding: 1em 0px; text-align: center;"><img alt="" border="0" data-original-height="1082" data-original-width="1290" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiSP1Cvjft0dRi417hoHiJl_lrns29rU8oAMYG8J1KeaFBUAsMhCUtsw9BdzLq9FwJSrcYNQRPDYfQo6Z6Wy7EURexAambEJR1JxJwbr9U_QJC5eWeC3RZU5WtcplNZpmbM9Mg96wQuAXrxFl6JM7Lv_gglHz5S64EE3DMzkQeGqbU-Sr2pX7o/s400/54905FF1-C953-4147-B6B8-B8D3C0088BC8.jpeg" width="400" /></a></div>
Now, let's issue a REST call:
<div class="separator" style="clear: both;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg__6WSVXkCDJrnbeM0MXupGRppjQuAoNBFpQ21N-I29s9kRpDOv7HjoqLHZX1FxVsO8F_Wxy_GzLpihCGk49SQJCLBKaCJHmhLDCVvGSoNeb4as2r5w-6HZi4nvzGqsysaV-pJxx1qXsu7thxM1K0R9DRmTcQPxp4C2ZHSWXBty8-3nFX8xXg/s1290/C5723AEE-8DD9-4615-96EC-94FC58E24AB7.jpeg" style="display: block; padding: 1em 0px; text-align: center;"><img alt="" border="0" data-original-height="1221" data-original-width="1290" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg__6WSVXkCDJrnbeM0MXupGRppjQuAoNBFpQ21N-I29s9kRpDOv7HjoqLHZX1FxVsO8F_Wxy_GzLpihCGk49SQJCLBKaCJHmhLDCVvGSoNeb4as2r5w-6HZi4nvzGqsysaV-pJxx1qXsu7thxM1K0R9DRmTcQPxp4C2ZHSWXBty8-3nFX8xXg/s400/C5723AEE-8DD9-4615-96EC-94FC58E24AB7.jpeg" width="400" /></a></div>
And finally let's extract the reply and pass it back to Siri to read it aloud.
<div class="separator" style="clear: both;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjtu9o9m82ymKgAifhsRTas80mmmWRwiGMY1Q993Ecojxe4nKLl_pPQjLTwGUgzAOnyspKeH-pqKvKANe9rHNB-v6GUfnrHMCP1wZfTQACTnjRz1o92Om7qfTz6ZHB999Kccq1Ktb0XsEBC2kZcRXqDzuC7kFWHe8keHT5klv_iuW7o1K_zLpM/s1411/2654BBD0-3E63-4044-A457-6B92A64D233B.jpeg" style="display: block; padding: 1em 0px; text-align: center;"><img alt="" border="0" data-original-height="1411" data-original-width="1290" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjtu9o9m82ymKgAifhsRTas80mmmWRwiGMY1Q993Ecojxe4nKLl_pPQjLTwGUgzAOnyspKeH-pqKvKANe9rHNB-v6GUfnrHMCP1wZfTQACTnjRz1o92Om7qfTz6ZHB999Kccq1Ktb0XsEBC2kZcRXqDzuC7kFWHe8keHT5klv_iuW7o1K_zLpM/s400/2654BBD0-3E63-4044-A457-6B92A64D233B.jpeg" /></a></div>
Now, you can say "Hey Siri, 2001" (I named the shortcut "2001") and it will ask you for a propmpt, once you provide it a moment later it will read the OpenAI/ChatGPT reply to you making Siri really useful and fun to engage with. It even works in CarPlay mode while driving.
<div><br /></div><div>You can <a href="https://drive.google.com/file/d/1iqiIMHf4pFtOxrxxtnF0mQUblOL7xJnH/view?usp=drivesdk">download</a> the Shortcut as well, enjoy.</div><div><br /></div><div>btw: I asked ChatGPT to improve the blog post, here is the result:</div><div><br /></div><div><p style="--tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-ring-color: rgba(59,130,246,0.5); --tw-ring-offset-color: #fff; --tw-ring-offset-shadow: 0 0 transparent; --tw-ring-offset-width: 0px; --tw-ring-shadow: 0 0 transparent; --tw-rotate: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-scroll-snap-strictness: proximity; --tw-shadow-colored: 0 0 transparent; --tw-shadow: 0 0 transparent; --tw-skew-x: 0; --tw-skew-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; -webkit-text-size-adjust: 100%; border: 0px solid rgb(217, 217, 227); box-sizing: border-box; caret-color: rgb(209, 213, 219); font-size: 16px; margin: 0px 0px 1.25em; white-space: pre-wrap;"><span style="color: #666666; font-family: courier;">The given blog post can be improved by making some modifications and clarifications. Below is the improved version of the post:</span></p><p style="--tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-ring-color: rgba(59,130,246,0.5); --tw-ring-offset-color: #fff; --tw-ring-offset-shadow: 0 0 transparent; --tw-ring-offset-width: 0px; --tw-ring-shadow: 0 0 transparent; --tw-rotate: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-scroll-snap-strictness: proximity; --tw-shadow-colored: 0 0 transparent; --tw-shadow: 0 0 transparent; --tw-skew-x: 0; --tw-skew-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; -webkit-text-size-adjust: 100%; border: 0px solid rgb(217, 217, 227); box-sizing: border-box; caret-color: rgb(209, 213, 219); font-size: 16px; margin: 1.25em 0px; white-space: pre-wrap;"><span style="color: #666666; font-family: courier;">Siri is an amazing tool that can automate various tasks and make life much easier. However, sometimes it can be frustrating when it responds with the message "I found some web results, I sent them to your iPhone." To make Siri even more useful and fun, we can integrate Siri/Shortcuts with OpenAI/ChatGPT, which can work with HomePod too.</span></p><p style="--tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-ring-color: rgba(59,130,246,0.5); --tw-ring-offset-color: #fff; --tw-ring-offset-shadow: 0 0 transparent; --tw-ring-offset-width: 0px; --tw-ring-shadow: 0 0 transparent; --tw-rotate: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-scroll-snap-strictness: proximity; --tw-shadow-colored: 0 0 transparent; --tw-shadow: 0 0 transparent; --tw-skew-x: 0; --tw-skew-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; -webkit-text-size-adjust: 100%; border: 0px solid rgb(217, 217, 227); box-sizing: border-box; caret-color: rgb(209, 213, 219); font-size: 16px; margin: 1.25em 0px; white-space: pre-wrap;"><span style="color: #666666; font-family: courier;">To get started, we need to define a variable with our OpenAI token. Then, we can use the Dictate Text action to convert speech to text and store the result in another variable. This allows Siri to understand our speech and perform the necessary actions.</span></p><p style="--tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-ring-color: rgba(59,130,246,0.5); --tw-ring-offset-color: #fff; --tw-ring-offset-shadow: 0 0 transparent; --tw-ring-offset-width: 0px; --tw-ring-shadow: 0 0 transparent; --tw-rotate: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-scroll-snap-strictness: proximity; --tw-shadow-colored: 0 0 transparent; --tw-shadow: 0 0 transparent; --tw-skew-x: 0; --tw-skew-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; -webkit-text-size-adjust: 100%; border: 0px solid rgb(217, 217, 227); box-sizing: border-box; caret-color: rgb(209, 213, 219); font-size: 16px; margin: 1.25em 0px; white-space: pre-wrap;"><span style="color: #666666; font-family: courier;">Next, we need to set another variable with the parameters for the OpenAI query. These parameters include the text we want to generate, the model we want to use, and the temperature of the output. Temperature refers to the randomness of the generated text. Higher temperatures will result in more creative and unexpected output, but lower temperatures will produce more accurate and predictable text.</span></p><p style="--tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-ring-color: rgba(59,130,246,0.5); --tw-ring-offset-color: #fff; --tw-ring-offset-shadow: 0 0 transparent; --tw-ring-offset-width: 0px; --tw-ring-shadow: 0 0 transparent; --tw-rotate: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-scroll-snap-strictness: proximity; --tw-shadow-colored: 0 0 transparent; --tw-shadow: 0 0 transparent; --tw-skew-x: 0; --tw-skew-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; -webkit-text-size-adjust: 100%; border: 0px solid rgb(217, 217, 227); box-sizing: border-box; caret-color: rgb(209, 213, 219); font-size: 16px; margin: 1.25em 0px; white-space: pre-wrap;"><span style="color: #666666; font-family: courier;">Once we have set the necessary variables, we can issue a REST call to the OpenAI API. This will send our query to the OpenAI server and retrieve the generated text. We can then use the Text action to speak the generated text aloud or display it on the screen.</span></p><p style="--tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-ring-color: rgba(59,130,246,0.5); --tw-ring-offset-color: #fff; --tw-ring-offset-shadow: 0 0 transparent; --tw-ring-offset-width: 0px; --tw-ring-shadow: 0 0 transparent; --tw-rotate: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-scroll-snap-strictness: proximity; --tw-shadow-colored: 0 0 transparent; --tw-shadow: 0 0 transparent; --tw-skew-x: 0; --tw-skew-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; -webkit-text-size-adjust: 100%; border: 0px solid rgb(217, 217, 227); box-sizing: border-box; caret-color: rgb(209, 213, 219); font-size: 16px; margin: 1.25em 0px 0px; white-space: pre-wrap;"><span style="color: #666666; font-family: courier;">It's important to note that using OpenAI/ChatGPT with Siri requires some programming skills and knowledge of APIs. However, with some practice and experimentation, anyone can create their own Siri shortcuts and integrate them with OpenAI. With this powerful combination, we can create a wide range of useful and entertaining applications, from generating creative writing prompts to answering trivia questions.</span></p></div></div>Robert Milkowskihttp://www.blogger.com/profile/13188276898947228975noreply@blogger.com0tag:blogger.com,1999:blog-9963485.post-34071333730891491732023-01-12T17:44:00.016+01:002023-01-12T18:00:42.534+01:00bpftrace vs. straceMany years ago I compared performance overhead of dtrace vs. truss on Solaris, in a microbenchmark.
This time I run a similar test but comparing bpftrace vs. strace on Linux.
To do so, I wrote a simple C program which stawns X threads and each thread will call stat("/tmp") N times in a loop. The program then prints the total time it took for all threads to execute.
<br>
While it is not necesarilly a very realistic test, it does show a potential overhead of tracing and difference between different technologies. Also sometimes you do need to trace a very tight loop which then might result in overheads like shown below (or even higher).
<br><br>
Let's run it three times (to see if we get consistent results) with 4 threads, each one calling stat() 100k times.
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;">
# for i in $(seq 1 3); do ./threads-loop-test-1 4 1000000; done
Total time: 0.982799
Total time: 0.977269
Total time: 0.980996
</pre>
</div>
</div>
It took just below 1s to execute.<br>
Now, let's run it under bpftrace which will count how many times the stat() was called by all threads.
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;">
# for i in $(seq 1 3); do bpftrace -qe 't:syscalls:sys_enter_newstat
/pid==cpid/{@[probe]=count();}'
-c "./threads-loop-test-1 4 1000000";
done
Total time: 1.192433
@[tracepoint:syscalls:sys_enter_newstat]: 4000019
Total time: 1.204695
@[tracepoint:syscalls:sys_enter_newstat]: 4000019
Total time: 1.143513
@[tracepoint:syscalls:sys_enter_newstat]: 4000019
</pre>
</div>
</div>
There is roughly a 20% overhead - not bad.<br>
Adding an extra condition to the predicate str(args->filename)=="/tmp" has little impact - resulting in total times <1.24s.<br>
Again, not bad, especially given that string comparison like this is rather expensive.<br>
<br>
Now time for strace.
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;">
# for i in $(seq 1 3); do strace -qfc -e trace=stat ./threads-loop-test-1 4 1000000; done
Total time: 49.478656
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 87.720184 21 4000019 18 stat
------ ----------- ----------- --------- --------- ----------------
100.00 87.720184 21 4000019 18 total
Total time: 49.336942
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 87.463778 21 4000019 18 stat
------ ----------- ----------- --------- --------- ----------------
100.00 87.463778 21 4000019 18 total
Total time: 49.250562
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 87.923629 21 4000019 18 stat
------ ----------- ----------- --------- --------- ----------------
100.00 87.923629 21 4000019 18 total
</pre>
</div>
</div>
It took about 50 times longer to execute!<br>
<br>
While there have been many improvements to strace to reduce its impact, it is still significant in some cases.<br>
It doesn't mean that strace is a bad tool and you should avoid it - in fact, it is often more handy and quicker to use than bpftrace or systemtap.<br>
However be mindful of its potentially much higher overhead, especially in tight loops.<br>
<br><br>
The source code for the test program.
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 300em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;">
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <time.h>
void *thread_handler(void *arg) {
int N = (long)arg;
struct stat file_stat;
while (N--) {
stat("/tmp", &file_stat);
}
pthread_exit(NULL);
}
int main(int argc, char *argv[]) {
struct timespec start, end;
int num_threads;
long N;
if (argc != 3) {
printf("Usage: %s <number of threads> <number of iterations per thread>\n", argv[0]);
exit(-1);
}
num_threads = atoi(argv[1]);
N = abs(atol(argv[2]));
pthread_t threads[num_threads];
int rc;
long t;
clock_gettime(CLOCK_MONOTONIC, &start);
for(t=0; t<num_threads; t++){
rc = pthread_create(&threads[t], NULL, thread_handler, (void *)N);
if (rc){
printf("ERROR; return code from pthread_create() is %d\n", rc);
exit(-1);
}
}
/* Wait for all threads to complete */
for(t=0; t<num_threads; t++) {
pthread_join(threads[t], NULL);
}
clock_gettime(CLOCK_MONOTONIC, &end);
double time_taken = (end.tv_sec - start.tv_sec) + (double)(end.tv_nsec - start.tv_nsec) / (double)1000000000;
printf("Total time: %lf\n", time_taken);
return 0;
}
</pre>
</div>
</div>Robert Milkowskihttp://www.blogger.com/profile/13188276898947228975noreply@blogger.com1tag:blogger.com,1999:blog-9963485.post-65539945773807901612022-12-28T16:23:00.000+01:002022-12-28T16:23:47.643+01:00NFS4ERR_EXPIRED - Linux NFS client bug (fixed)Another blog entry saved as draft for a long time and waiting to be published... better late than never, so let's do it now.
<br>
Over three years ago I found a bug in Linux nfs v4.0 client which results in applications temporarily not being able to open a file and getting an error. While this is due to a bug in the nfs v4.0 client code, depending on a specific nfs server you might or might not hit the issue. For example, both Linux and Solaris NFS servers will trigger the bug, while NetApp/ONTAP won't.<br />
<br />
Once I debuged and understood the problem and was able to reproduce it. Then I ended up with a temporary workaround and later submitted two patches to Linux kernel which quickly have been integrated into Linux 5.6 and backported to some older versions (see the patches at the very end of this blog entry).<br />
<br />
Let's look in more detail at the issue.<br />
<br />
btw: some output has been abbreviated and/or modified to remove irrelevant details or to anonymize details.<br />
<br />
<div style="-webkit-text-stroke-width: 0px; background-color: white; color: #172b4d; font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Oxygen, Ubuntu, "Fira Sans", "Droid Sans", "Helvetica Neue", sans-serif; font-size: 14px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; margin: 10px 0px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
An application written in java was sometimes failing to open a file with the below exception:</div>
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;">Caused by: java.io.FileNotFoundException: /.../latest.link (Input/output error)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
...
</pre>
</div>
</div>
<div style="-webkit-text-stroke-width: 0px; background-color: white; color: #172b4d; font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Oxygen, Ubuntu, "Fira Sans", "Droid Sans", "Helvetica Neue", sans-serif; font-size: 14px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; margin: 10px 0px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<br />
Developers added an retry logic which helped. Usually after few retries the application was able to open the affected file fine. This was happening to different files, at different times on multiple servers. Even to files which were created long time ago (don't get fooled by the java.io.FileNotFoundException).</div>
<b></b><i></i><u></u><sub></sub><sup></sup><strike></strike><br />
To understand how a file is accessed by the application (although some other applications were affected too), I wrote the below code based on the affected application.
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;">$ cat Test.java
import java.io.*;
...
public class Test {
public static void main(String[] args) {
FileUtils fu = new FileUtils();
File dir = new File("/mnt/.../indexweights");
File ft = new File("/tmp/foo");
try {
ft = fu.resolveLinkFile(dir);
} catch (IOException e) {
System.out.println("IOException: " + e.getMessage());
e.printStackTrace();
} finally {
System.out.println("Destination path: " + ft.getAbsolutePath());
}
}
}
</pre>
</div>
</div>
<br />
Let's run it under strace and see what system calls it calls:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;">$ strace -tTfv -o /tmp/a -e trace=file,close,read java Test
Destination path: /mnt/...
$ grep -A2 latest.link /tmp/a
4422 15:33:38 open("/mnt/.../indexweights/latest.link", O_RDONLY) = 5 <0.069349>
4422 15:33:38 read(5, "...", 8192) = 11 <0.000010>
4422 15:33:38 close(5) = 0 <0.069149>
</pre>
</div>
</div>
It is just a simple open() with O_RDONLY followed by read() and close(). Nothing suspicious. The error is happening with open() though, based on the java stack in the app log.
<br />
I did run the test program multiple times on affected servers and tried to access the affected files, but couldn't trigger the issue.<br />
<br />
We modified one of the affected applications and added a retry logic for failed file opens, we also run part of the application under strace to see what's the actual error reported by open(). Let's see an example output:<br />
<br />
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;">$ egrep ' open\(|open resume' .../XXX.out.20190625220935.26281
...
[pid 133595] 22:10:17 open("/mnt/.../latest.link", O_RDONLY <unfinished ...>
[pid 133595] 22:10:17 <... open resumed> ) = -1 EIO (Input/output error) <0.069362>
...
[pid 133595] 22:10:21 open("/mnt/.../latest.link", O_RDONLY <unfinished ...>
[pid 133595] 22:10:21 <... open resumed> ) = -1 EIO (Input/output error) <0.069204>
...
[pid 133595] 22:10:29 open("/mnt/.../latest.link", O_RDONLY <unfinished ...>
[pid 133595] 22:10:29 <... open resumed> ) = 192 <0.487662></pre>
</div>
</div>
The open() failed with EIO two times and then succeeded the third time.<br />
Given these are nfs4 mount points with "hard" mount option we shouldn't be getting EIO though.<br />
When the issue happens no errors reported in system logs.
<br />
I also collected network traffic for the above failures.</div>
</div>
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 300em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;"># tshark -r tcpdump.out.201906252205 -tad nfs
...
160040 2019-06-25 22:10:17 10.2.1.142 -> 10.2.6.11 NFS 242 V4 Call ACCESS FH: 0x18703b49, [Check: RD LU MD XT DL]
160044 2019-06-25 22:10:17 10.2.6.11 -> 10.2.1.142 NFS 222 V4 Reply (Call In 160040) ACCESS, [Access Denied: MD XT DL], [Allowed: RD LU]
160066 2019-06-25 22:10:17 10.2.1.142 -> 10.2.6.11 NFS 322 V4 Call OPEN DH: 0x18703b49/latest.link
160074 2019-06-25 22:10:17 10.2.6.11 -> 10.2.1.142 NFS 150 V4 Reply (Call In 160066) OPEN Status: NFS4ERR_EXPIRED
160793 2019-06-25 22:10:18 10.2.1.142 -> 10.2.6.11 NFS 322 V4 Call OPEN DH: 0x18703b49/latest.link
160799 2019-06-25 22:10:18 10.2.6.11 -> 10.2.1.142 NFS 150 V4 Reply (Call In 160793) OPEN Status: NFS4ERR_EXPIRED
161368 2019-06-25 22:10:19 10.2.1.142 -> 10.2.6.11 NFS 322 V4 Call OPEN DH: 0x18703b49/latest.link
161394 2019-06-25 22:10:19 10.2.6.11 -> 10.2.1.142 NFS 150 V4 Reply (Call In 161368) OPEN Status: NFS4ERR_EXPIRED
162036 2019-06-25 22:10:21 10.2.1.142 -> 10.2.6.11 NFS 234 V4 Call GETATTR FH: 0x7a1ffedc
162112 2019-06-25 22:10:21 10.2.6.11 -> 10.2.1.142 NFS 338 V4 Reply (Call In 162036) GETATTR
162114 2019-06-25 22:10:21 10.2.1.142 -> 10.2.6.11 NFS 322 V4 Call OPEN DH: 0x18703b49/latest.link
162194 2019-06-25 22:10:21 10.2.6.11 -> 10.2.1.142 NFS 150 V4 Reply (Call In 162114) OPEN Status: NFS4ERR_EXPIRED
162803 2019-06-25 22:10:22 10.2.1.142 -> 10.2.6.6 NFS 226 V4 Call RENEW CID: 0x8dbc
162811 2019-06-25 22:10:22 10.2.6.6 -> 10.2.1.142 NFS 182 V4 Reply (Call In 162803) RENEW Status: NFS4ERR_EXPIRED
162813 2019-06-25 22:10:22 10.2.1.142 -> 10.2.6.6 NFS 226 V4 Call RENEW CID: 0x8dbc
162825 2019-06-25 22:10:22 10.2.6.6 -> 10.2.1.142 NFS 182 V4 Reply (Call In 162813) RENEW Status: NFS4ERR_EXPIRED
162826 2019-06-25 22:10:22 10.2.1.142 -> 10.2.6.6 NFS 318 V4 Call SETCLIENTID
162850 2019-06-25 22:10:23 10.2.6.6 -> 10.2.1.142 NFS 198 V4 Reply (Call In 162826) SETCLIENTID
162851 2019-06-25 22:10:23 10.2.1.142 -> 10.2.6.6 NFS 234 V4 Call SETCLIENTID_CONFIRM
162858 2019-06-25 22:10:23 10.2.6.6 -> 10.2.1.142 NFS 182 V4 Reply (Call In 162851) SETCLIENTID_CONFIRM
163272 2019-06-25 22:10:24 10.2.1.142 -> 10.2.6.11 NFS 226 V4 Call RENEW CID: 0x0a7f
163302 2019-06-25 22:10:25 10.2.6.11 -> 10.2.1.142 NFS 182 V4 Reply (Call In 163272) RENEW Status: NFS4ERR_EXPIRED
163304 2019-06-25 22:10:25 10.2.1.142 -> 10.2.6.11 NFS 226 V4 Call RENEW CID: 0x0a7f
163332 2019-06-25 22:10:25 10.2.6.11 -> 10.2.1.142 NFS 182 V4 Reply (Call In 163304) RENEW Status: NFS4ERR_EXPIRED
163333 2019-06-25 22:10:25 10.2.1.142 -> 10.2.6.11 NFS 318 V4 Call SETCLIENTID
163343 2019-06-25 22:10:25 10.2.6.11 -> 10.2.1.142 NFS 198 V4 Reply (Call In 163333) SETCLIENTID
163344 2019-06-25 22:10:25 10.2.1.142 -> 10.2.6.11 NFS 234 V4 Call SETCLIENTID_CONFIRM
163382 2019-06-25 22:10:25 10.2.6.11 -> 10.2.1.142 NFS 182 V4 Reply (Call In 163344) SETCLIENTID_CONFIRM
164083 2019-06-25 22:10:27 10.2.1.142 -> 10.2.6.5 NFS 1606 V4 NULL Call
164109 2019-06-25 22:10:27 10.2.6.5 -> 10.2.1.142 NFS 302 V4 NULL Reply (Call In 164083)
164289 2019-06-25 22:10:28 10.2.1.142 -> 10.2.6.6 NFS 1606 V4 NULL Call
164301 2019-06-25 22:10:28 10.2.6.6 -> 10.2.1.142 NFS 302 V4 NULL Reply (Call In 164289)
164303 2019-06-25 22:10:28 10.2.1.142 -> 10.2.6.6 NFS 226 V4 Call RENEW CID: 0x9d10
164326 2019-06-25 22:10:28 10.2.6.6 -> 10.2.1.142 NFS 182 V4 Reply (Call In 164303) RENEW
164417 2019-06-25 22:10:28 10.2.1.142 -> 10.2.6.11 NFS 242 V4 Call ACCESS FH: 0x118824df, [Check: RD LU MD XT DL]
164427 2019-06-25 22:10:28 10.2.6.11 -> 10.2.1.142 NFS 222 V4 Reply (Call In 164417) ACCESS, [NOT Supported: MD XT DL], [Allowed: RD LU]
164429 2019-06-25 22:10:28 10.2.1.142 -> 10.2.6.11 NFS 234 V4 Call GETATTR FH: 0x7a1ffedc
164440 2019-06-25 22:10:28 10.2.6.11 -> 10.2.1.142 NFS 338 V4 Reply (Call In 164429) GETATTR
164441 2019-06-25 22:10:28 10.2.1.142 -> 10.2.6.11 NFS 322 V4 Call OPEN DH: 0x18703b49/latest.link
164458 2019-06-25 22:10:28 10.2.6.11 -> 10.2.1.142 NFS 466 V4 Reply (Call In 164441) OPEN StateID: 0xae70
164459 2019-06-25 22:10:28 10.2.1.142 -> 10.2.6.11 NFS 242 V4 Call OPEN_CONFIRM
164473 2019-06-25 22:10:29 10.2.6.11 -> 10.2.1.142 NFS 166 V4 Reply (Call In 164459) OPEN_CONFIRM
164474 2019-06-25 22:10:29 10.2.1.142 -> 10.2.6.11 NFS 250 V4 Call READ StateID: 0xa087 Offset: 0 Len: 11
164498 2019-06-25 22:10:29 10.2.6.11 -> 10.2.1.142 NFS 170 V4 Reply (Call In 164474) READ
164499 2019-06-25 22:10:29 10.2.1.142 -> 10.2.6.11 NFS 234 V4 Call GETATTR FH: 0xedefdae0
164519 2019-06-25 22:10:29 10.2.6.11 -> 10.2.1.142 NFS 338 V4 Reply (Call In 164499) GETATTR
164520 2019-06-25 22:10:29 10.2.1.142 -> 10.2.6.11 NFS 258 V4 Call CLOSE StateID: 0xa087
164545 2019-06-25 22:10:29 10.2.6.11 -> 10.2.1.142 NFS 230 V4 Reply (Call In 164520) CLOSE
...
</pre>
</div>
</div>
Looks like nfs session is expiring and asynchronously renewing, failing in the meantime.<br>
<br>
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;">$ sudo /var/tmp/syscall-open-stat-filename-regex.stp -g --suppress-time-limits -o /var/tmp/failed_stats_and_opens.log 'XXX.*zip' &
[5] 21855
</pre>
</div>
</div>
We caught some errors:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;"># grep '= -' /var/tmp/failed_stats_and_opens.log
Tue Nov 12 17:11:02 2019 java[16043] SyS_open("XXX") = -5 (EIO) : -5
Tue Nov 12 17:11:02 2019 java[16078] SyS_open("XXX") = -5 (EIO) : -5
...
</pre>
</div>
</div>
Let's look at one java thread:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;"># grep 16043 /var/tmp/failed_stats_and_opens.log
Tue Nov 12 17:11:02 2019 java[16043] SyS_newlstat("/mnt/.../XXX.zip") = 0 : 0
Tue Nov 12 17:11:02 2019 java[16043] SyS_newstat("/mnt/.../XXX.zip") = 0 : 0
Tue Nov 12 17:11:02 2019 java[16043] SyS_newstat("/mnt/.../XXX.zip") = 0 : 0
Tue Nov 12 17:11:02 2019 java[16043] SyS_newstat("/mnt/.../XXX.zip") = 0 : 0
Tue Nov 12 17:11:02 2019 java[16043] SyS_open("/mnt/.../XXX.zip") = -5 (EIO) : -5
</pre>
</div>
</div>
So within the same second it stat'ed the file multiple times just fine (1x lstat() + 3x stat()) only to get EIO on open().<br />
Let's look at tcpdump capture:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;"># tcpdump -n -r /var/tmp/tcpdump.XXX.out
...
17:11:02.766890 IP 10.210.16.12.cisco-tdp > 10.210.6.6.nfs: Flags [P.], seq 75637865:75638065, ack 2022930725, win 12288, options [nop,nop,TS val 12657161 ecr 997162631], length 200: NFS request xid 2489184212 196 getattr fh 0,0/22
17:11:02.767043 IP 10.210.6.6.nfs > 10.210.16.12.cisco-tdp: Flags [P.], seq 2022930725:2022931065, ack 75638065, win 32806, options [nop,nop,TS val 997162631 ecr 12657161], length 340: NFS reply xid 2489184212 reply ok 336 getattr NON 4 ids 0/15 sz 0
17:11:02.769335 IP 10.210.16.12.cisco-tdp > 10.210.6.6.nfs: Flags [P.], seq 75638065:75638329, ack 2022931065, win 12288, options [nop,nop,TS val 12657164 ecr 997162631], length 264: NFS request xid 2505961428 260 getattr fh 0,0/22
17:11:02.769477 IP 10.210.6.6.nfs > 10.210.16.12.cisco-tdp: Flags [P.], seq 2022931065:2022931149, ack 75638329, win 32806, options [nop,nop,TS val 997162631 ecr 12657164], length 84: NFS reply xid 2505961428 reply ok 80 getattr ERROR: unk 10011
17:11:02.769544 IP 10.210.16.12.cisco-tdp > 10.210.6.6.nfs: Flags [P.], seq 75638329:75638593, ack 2022931149, win 12288, options [nop,nop,TS val 12657164 ecr 997162631], length 264: NFS request xid 2522738644 260 getattr fh 0,0/22
17:11:02.769638 IP 10.210.6.6.nfs > 10.210.16.12.cisco-tdp: Flags [P.], seq 2022931149:2022931233, ack 75638593, win 32806, options [nop,nop,TS val 997162631 ecr 12657164], length 84: NFS reply xid 2522738644 reply ok 80 getattr ERROR: unk 10011
17:11:02.809052 IP 10.210.16.12.cisco-tdp > 10.210.6.6.nfs: Flags [.], ack 2022931233, win 12288, options [nop,nop,TS val 12657204 ecr 997162631], length 0
17:11:02.821772 IP 10.210.16.12.cisco-tdp > 10.210.6.6.nfs: Flags [P.], seq 75638593:75638789, ack 2022931233, win 12288, options [nop,nop,TS val 12657216 ecr 997162631], length 196: NFS request xid 2539515860 192 getattr fh 0,0/22
17:11:02.822228 IP 10.210.6.6.nfs > 10.210.16.12.cisco-tdp: Flags [P.], seq 2022931233:2022931573, ack 75638789, win 32806, options [nop,nop,TS val 997162636 ecr 12657216], length 340: NFS reply xid 2539515860 reply ok 336 getattr NON 4 ids 0/15 sz 0
17:11:02.822261 IP 10.210.16.12.cisco-tdp > 10.210.6.6.nfs: Flags [.], ack 2022931573, win 12288, options [nop,nop,TS val 12657217 ecr 997162636], length 0
17:11:02.823327 IP 10.210.16.12.cisco-tdp > 10.210.6.6.nfs: Flags [P.], seq 75638789:75638957, ack 2022931573, win 12288, options [nop,nop,TS val 12657218 ecr 997162636], length 168: NFS request xid 2556293076 164 getattr fh 0,0/22
17:11:02.823436 IP 10.210.6.6.nfs > 10.210.16.12.cisco-tdp: Flags [P.], seq 2022931573:2022931857, ack 75638957, win 32806, options [nop,nop,TS val 997162636 ecr 12657218], length 284: NFS reply xid 2556293076 reply ok 280 getattr NON 2 ids 0/9 sz 0
...
</pre>
</div>
</div>
It correlates with "getattr ERROR: unk 10011"<br>
Let's also look at tcpdump capture for YYY:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 300em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;"># tcpdump -n -r /var/tmp/tcpdump.YYY.out
...
17:10:08.447238 IP 10.210.16.12.841 > 10.210.6.5.nfs: Flags [.], ack 174377189, win 11471, options [nop,nop,TS val 12602842 ecr 997157199], length 0
17:11:08.645057 IP 10.210.16.12.841 > 10.210.6.5.nfs: Flags [.], ack 174377189, win 11471, options [nop,nop,TS val 12663040 ecr 997157199], length 0
17:11:08.645147 IP 10.210.6.5.nfs > 10.210.16.12.841: Flags [.], ack 314027332, win 32806, options [nop,nop,TS val 997163219 ecr 12602842], length 0
17:11:08.645173 IP 10.210.16.12.841 > 10.210.6.5.nfs: Flags [P.], seq 314027332:314027492, ack 174377189, win 11471, options [nop,nop,TS val 12663040 ecr 997163219], length 160: NFS request xid 398604749 156 getattr fh 0,28/0
17:11:08.645249 IP 10.210.6.5.nfs > 10.210.16.12.841: Flags [.], ack 314027492, win 32806, options [nop,nop,TS val 997163219 ecr 12663040], length 0
17:11:08.692687 IP 10.210.6.5.nfs > 10.210.16.12.841: Flags [P.], seq 174377189:174377305, ack 314027492, win 32806, options [nop,nop,TS val 997163223 ecr 12663040], length 116: NFS reply xid 398604749 reply ok 112 getattr ERROR: unk 24
17:11:08.692796 IP 10.210.16.12.841 > 10.210.6.5.nfs: Flags [P.], seq 314027492:314027652, ack 174377305, win 11471, options [nop,nop,TS val 12663087 ecr 997163223], length 160: NFS request xid 415381965 156 getattr fh 0,28/0
17:11:08.693123 IP 10.210.6.5.nfs > 10.210.16.12.841: Flags [P.], seq 174377305:174377421, ack 314027652, win 32806, options [nop,nop,TS val 997163223 ecr 12663087], length 116: NFS reply xid 415381965 reply ok 112 getattr ERROR: unk 24
17:11:08.693164 IP 10.210.16.12.841 > 10.210.6.5.nfs: Flags [P.], seq 314027652:314027916, ack 174377421, win 11471, options [nop,nop,TS val 12663088 ecr 997163223], length 264: NFS request xid 432159181 260 getattr fh 0,132/0
17:11:08.693592 IP 10.210.6.5.nfs > 10.210.16.12.841: Flags [P.], seq 174377421:174377553, ack 314027916, win 32806, options [nop,nop,TS val 997163223 ecr 12663088], length 132: NFS reply xid 432159181 reply ok 128 getattr ERROR: unk 40
17:11:08.693680 IP 10.210.16.12.841 > 10.210.6.5.nfs: Flags [P.], seq 314027916:314028084, ack 174377553, win 11471, options [nop,nop,TS val 12663088 ecr 997163223], length 168: NFS request xid 448936397 164 getattr fh 0,36/0
17:11:08.701988 IP 10.210.6.5.nfs > 10.210.16.12.841: Flags [P.], seq 174377553:174377669, ack 314028084, win 32806, options [nop,nop,TS val 997163224 ecr 12663088], length 116: NFS reply xid 448936397 reply ok 112 getattr ERROR: unk 24
17:11:08.702090 IP 10.210.6.5.63021 > 10.210.16.12.33663: Flags [S], seq 1851114974, win 32804, options [mss 1460,sackOK,TS val 997163224 ecr 0,nop,wscale 5], length 0
17:11:08.702122 IP 10.210.16.12.33663 > 10.210.6.5.63021: Flags [S.], seq 2641182587, ack 1851114975, win 28960, options [mss 1460,sackOK,TS val 12663097 ecr 997163224,nop,wscale 8], length 0
17:11:08.702206 IP 10.210.6.5.63021 > 10.210.16.12.33663: Flags [.], ack 1, win 32806, options [nop,nop,TS val 997163224 ecr 12663097], length 0
17:11:08.704470 IP 10.210.6.5.63021 > 10.210.16.12.33663: Flags [P.], seq 1:77, ack 1, win 32806, options [nop,nop,TS val 997163225 ecr 12663097], length 76
17:11:08.704487 IP 10.210.16.12.33663 > 10.210.6.5.63021: Flags [.], ack 77, win 114, options [nop,nop,TS val 12663099 ecr 997163225], length 0
17:11:08.704534 IP 10.210.16.12.33663 > 10.210.6.5.63021: Flags [P.], seq 1:29, ack 77, win 114, options [nop,nop,TS val 12663099 ecr 997163225], length 28
17:11:08.704620 IP 10.210.6.5.63021 > 10.210.16.12.33663: Flags [.], ack 29, win 32806, options [nop,nop,TS val 997163225 ecr 12663099], length 0
17:11:08.741063 IP 10.210.16.12.841 > 10.210.6.5.nfs: Flags [.], ack 174377669, win 11471, options [nop,nop,TS val 12663136 ecr 997163224], length 0
17:11:13.717133 IP 10.210.16.12.841 > 10.210.6.5.nfs: Flags [P.], seq 314028084:314028244, ack 174377669, win 11471, options [nop,nop,TS val 12668112 ecr 997163224], length 160: NFS request xid 465713613 156 getattr fh 0,28/0
17:11:13.717704 IP 10.210.6.5.nfs > 10.210.16.12.841: Flags [P.], seq 174377669:174377785, ack 314028244, win 32806, options [nop,nop,TS val 997163726 ecr 12668112], length 116: NFS reply xid 465713613 reply ok 112 getattr ERROR: unk 24
17:11:13.717715 IP 10.210.16.12.841 > 10.210.6.5.nfs: Flags [.], ack 174377785, win 11471, options [nop,nop,TS val 12668112 ecr 997163726], length 0
17:12:02.981295 IP 10.210.6.5.57869 > 10.210.16.12.33663: Flags [S], seq 2961022330, win 32804, options [mss 1460,sackOK,TS val 997168652 ecr 0,nop,wscale 5], length 0
...
</pre>
</div>
</div>
Few seconds later we see "getattr ERROR: unk 24", and a corresponding error in application error log.<br>
Nothing interesting in system logs on NFS server. However, on the nfs server I was also running:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;"># dtrace -q -n nfsv4:::op-getattr-done,nfsv4:::op-readdir-done,nfsv4:::op-access-done,
nfsv4:::op-link-done,nfsv4:::op-open-done,nfsv4:::op-create-done,
nfsv4:::op-delegreturn-done,nfsv4:::op-verify-done,nfsv4:::op-getfh-done,
nfsv4:::op-readlink-done,nfsv4:::op-renew-done,nfsv4:::op-lookup-done,
nfsv4:::op-remove-done,nfsv4:::op-rename-done'
/args[2]->status != 0/
{
printf("%Y %s %s %d:%d %s %d\n", walltimestamp, probename, args[0]->ci_remote,
args[1]->noi_cred->cr_uid, args[1]->noi_cred->cr_gid,
args[1]->noi_curpath, args[2]->status);}'
-o /var/tmp/nfs-ops-with-errno.log
</pre>
</div>
</div>
Let's see log around Nov 12 17:11:02:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;">...
2019 Nov 12 17:10:42 op-lookup-done 10.250.3.29 10006:5000 /.../XXX 2
2019 Nov 12 17:10:41 op-renew-done 10.210.13.49 60001:60001 <unknown> 10022
2019 Nov 12 17:10:46 op-renew-done 10.210.13.49 60001:60001 <unknown> 10022
2019 Nov 12 17:10:46 op-lookup-done 10.250.3.29 10006:5000 <unknown> 2
2019 Nov 12 17:10:49 op-lookup-done 10.210.10.152 10006:5000 <unknown> 2
2019 Nov 12 17:10:50 op-lookup-done 10.250.3.29 10006:5000 /.../XXX 2
...
2019 Nov 12 17:10:55 op-lookup-done 10.250.3.29 10006:5000 <unknown> 2
2019 Nov 12 17:10:56 op-renew-done 10.210.13.49 60001:60001 <unknown> 10022
2019 Nov 12 17:10:58 op-renew-done 10.210.136.15 60001:60001 <unknown> 10011
2019 Nov 12 17:10:59 op-renew-done 10.210.136.15 60001:60001 <unknown> 10011
2019 Nov 12 17:10:59 op-renew-done 10.210.136.15 60001:60001 <unknown> 10011
2019 Nov 12 17:10:58 op-renew-done 10.210.136.15 60001:60001 <unknown> 10011
...
</pre>
</div>
</div>
Notice the op-renew-done errors.<br />
Let's see what these are:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;"># ggrep -E '10011|10022' /usr/include/nfs/nfs4_kprot.h
NFS4ERR_EXPIRED = 10011,
NFS4ERR_STALE_CLIENTID = 10022,
</pre>
</div>
</div>
The op-renew-done failure with 10011 (NFS4ERR_EXPIRED) matches the time when the application failed.<br />
The other interesting thing is, that the nfs linux client has gssd running with -t 1800 so it needs to renew creds every 30 minutes.
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;"># ggrep 'Nov 12' /var/tmp/nfs-ops-with-errno.log | ggrep 10.210.16.12 | ggrep op-renew
2019 Nov 12 15:10:45 op-renew-done 10.210.16.12 60001:60001 <unknown> 10011
2019 Nov 12 15:10:45 op-renew-done 10.210.16.12 60001:60001 <unknown> 10011
2019 Nov 12 15:40:57 op-renew-done 10.210.16.12 60001:60001 <unknown> 10011
2019 Nov 12 15:40:56 op-renew-done 10.210.16.12 60001:60001 <unknown> 10011
2019 Nov 12 16:04:58 op-renew-done 10.210.16.12 60001:60001 <unknown> 10011
2019 Nov 12 16:04:58 op-renew-done 10.210.16.12 60001:60001 <unknown> 10011
2019 Nov 12 16:10:58 op-renew-done 10.210.16.12 60001:60001 <unknown> 10011
2019 Nov 12 16:10:58 op-renew-done 10.210.16.12 60001:60001 <unknown> 10011
2019 Nov 12 16:40:29 op-renew-done 10.210.16.12 60001:60001 <unknown> 10011
2019 Nov 12 16:40:29 op-renew-done 10.210.16.12 60001:60001 <unknown> 10011
2019 Nov 12 17:11:08 op-renew-done 10.210.16.12 60001:60001 <unknown> 10011
2019 Nov 12 17:11:08 op-renew-done 10.210.16.12 60001:60001 <unknown> 10011
...
</pre>
</div>
</div>
The interesting entries here are:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;">2019 Nov 12 16:40:29 op-renew-done 10.210.16.12 60001:60001 <unknown> 10011
2019 Nov 12 16:40:29 op-renew-done 10.210.16.12 60001:60001 <unknown> 10011
2019 Nov 12 17:11:08 op-renew-done 10.210.16.12 60001:60001 <unknown> 10011
2019 Nov 12 17:11:08 op-renew-done 10.210.16.12 60001:60001 <unknown> 10011
</pre>
</div>
</div>
This seems to be 39s later than 30 minutes between them. Although I didn't catch the successful op-renew-done here.<br>
<br>
After some further debugging let's look at the original network dump again.<br>
It calls reaaddir, access, .... to get to final directory and then tries to open file, server responds with NFS4ERR_EXPIRED, client tries again, gets the error again and it seems to give up and move on to other files where the story repeats for each one of them. All directory lookups, readdirs, etc. work fine though. Here is just the open, it's retry and errors for the first file that failed:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;">102126 17:11:02.769335 10.210.16.12 10.210.6.6 NFS 330 V4 Call (Reply In 102127) PUTFH | OPEN DH: 0xcbbed605/XXX.zip | GETFH | ACCESS FH: 0x00000000, [Check: RD MD XT XE] | GETATTR
102127 17:11:02.769477 10.210.6.6 10.210.16.12 NFS 150 V4 Reply (Call In 102126) PUTFH | OPEN Status: NFS4ERR_EXPIRED
102128 17:11:02.769544 10.210.16.12 10.210.6.6 NFS 330 V4 Call (Reply In 102129) PUTFH | OPEN DH: 0xcbbed605/XXX.zip | GETFH | ACCESS FH: 0x00000000, [Check: RD MD XT XE] | GETATTR
102129 17:11:02.769638 10.210.6.6 10.210.16.12 NFS 150 V4 Reply (Call In 102128) PUTFH | OPEN Status: NFS4ERR_EXPIRED
</pre>
</div>
</div>
After multiple open() calls failed it, the nfs client eventually does:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;">102831 17:12:02.917119 10.210.16.12 10.210.6.6 NFS 226 V4 Call (Reply In 102832) RENEW CID: 0xab20
102832 17:12:02.964745 10.210.6.6 10.210.16.12 NFS 182 V4 Reply (Call In 102831) RENEW Status: NFS4ERR_EXPIRED
102834 17:12:02.967293 10.210.16.12 10.210.6.6 NFS 226 V4 Call (Reply In 102835) RENEW CID: 0xab20
102835 17:12:02.967972 10.210.6.6 10.210.16.12 NFS 182 V4 Reply (Call In 102834) RENEW Status: NFS4ERR_EXPIRED
102836 17:12:02.968021 10.210.16.12 10.210.6.6 NFS 330 V4 Call (Reply In 102837) SETCLIENTID
102837 17:12:02.968332 10.210.6.6 10.210.16.12 NFS 198 V4 Reply (Call In 102836) SETCLIENTID
102838 17:12:02.968365 10.210.16.12 10.210.6.6 NFS 234 V4 Call (Reply In 102839) SETCLIENTID_CONFIRM
102839 17:12:02.981174 10.210.6.6 10.210.16.12 NFS 182 V4 Reply (Call In 102838) SETCLIENTID_CONFIRM
</pre>
</div>
</div>
which fixes the issue and a moment later it can open files fine:</div>
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;">102874 17:12:04.838647 10.210.16.12 10.210.6.6 NFS 330 V4 Call (Reply In 102875) PUTFH | OPEN DH: 0x6030bc46/XXX.zip | GETFH | ACCESS FH: 0x00000000, [Check: RD MD XT XE] | GETATTR
102875 17:12:04.838844 10.210.6.6 10.210.16.12 NFS 478 V4 Reply (Call In 102874) PUTFH | OPEN StateID: 0x9aaf | GETFH | ACCESS, [Access Denied: XE], [Allowed: RD MD XT] | GETATTR
</pre>
</div>
</div>
The last time the client issues RENEW operation before the first failure was:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;">101689 17:06:16.805156 10.210.16.12 10.210.6.6 NFS 226 V4 Call (Reply In 101691) RENEW CID: 0xab20
101691 17:06:16.838027 10.210.6.6 10.210.16.12 NFS 182 V4 Reply (Call In 101689) RENEW
101694 17:07:16.965131 10.210.16.12 10.210.6.6 NFS 226 V4 Call (Reply In 101697) RENEW CID: 0xab20
101697 17:07:16.997082 10.210.6.6 10.210.16.12 NFS 182 V4 Reply (Call In 101694) RENEW
101701 17:08:17.125137 10.210.16.12 10.210.6.6 NFS 226 V4 Call (Reply In 101703) RENEW CID: 0xab20
101703 17:08:17.157783 10.210.6.6 10.210.16.12 NFS 182 V4 Reply (Call In 101701) RENEW
101706 17:09:17.285124 10.210.16.12 10.210.6.6 NFS 226 V4 Call (Reply In 101709) RENEW CID: 0xab20
101709 17:09:17.320893 10.210.6.6 10.210.16.12 NFS 182 V4 Reply (Call In 101706) RENEW
... no more RENEW operations until:
102831 17:12:02.917119 10.210.16.12 10.210.6.6 NFS 226 V4 Call (Reply In 102832) RENEW CID: 0xab20
</pre>
</div>
</div>
Notice that it's been issuing RENEW every 60s (fileserver sets the grace period to 90s), and then it hasn't done so for almost 166s which is more than the 90s session timeout requested by server. For sanity let's check when the first NFS4_EXPIRED error happened. The last RENEW before the issue was at 17:09:17 so unless the sessions was renewed (it wasn't) the server should start responding with errors from about 17:10:47. The first NFS4ERR_EXPIRED is recorded at 17:11:02 which is the first OPEN call happening during the time frame.<br>
Looks like a bug in the nfs client.<br>
<br>
Let's run some additional debugging.<br />
Network packet capture:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;"># tcpdump -i aggr1 -w /var/tmp/tcpdump.20191119.out host 10.210.16.12 &
</pre>
</div>
</div>
Let's trace all checks for nfsv4 lease expire and record all cases where it did expire, this is traced on the nfs server:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;"># dtrace -q -n fbt::rfs4_lease_expired:entry' {self->cp=args[0];}'
-n fbt::rfs4_lease_expired:return'
/arg1 && self->cp/
{this->s=(struct sockaddr_in *)&self->cp->rc_addr;
printf("%Y %s client_ip: %s client_id: %x forced_expire: %d last_access: %u dt: %d rfs4_lease_time: %d ret: %d\n",
walltimestamp, probefunc, inet_ntop(this->s->sin_family, (void *)&this->s->sin_addr),
self->cp->rc_clientid, self->cp->rc_forced_expire, self->cp->rc_last_access,
(walltimestamp/1000000000)-self->cp->rc_last_access, `rfs4_lease_time, arg1);
self->cp=0;stack()}'
-n fbt::rfs4_lease_expired:return'/self->cp/{self->cp=0;}' -o /var/tmp/rfs4_lease_expired_failed.out &
</pre>
</div>
</div>
Let's record all nfs RENEW calls:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;"># dtrace -q -n nfsv4:::op-renew-start'
{self->t=timestamp;
printf("%Y %s client_id: 0x%x noi_xid: %u %s %d:%d %s\n", walltimestamp, probename, args[2]->clientid,
args[1]->noi_xid, args[0]->ci_remote, args[1]->noi_cred->cr_uid, args[1]->noi_cred->cr_gid, args[1]->noi_curpath);}'
-n nfsv4:::op-renew-done'
/self->t/
{printf("%Y %s noi_xid: %u %s %s %d:%d %s ret: %d took: %d seconds\n", walltimestamp, probename,
args[1]->noi_xid, args[0]->ci_remote, args[0]->ci_local, args[1]->noi_cred->cr_uid,
args[1]->noi_cred->cr_gid, args[1]->noi_curpath, args[2]->status,
(timestamp-self->t)/1000000000);
self->t=0;}'
-o /var/tmp/nfs4-renew.log &
</pre>
</div>
</div>
Let's catch all nfsv4 operations and their statuses on nfs server:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 300em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;"># cat /var/tmp/ss
#!/bin/bash
dtrace -q -n nfsv4:::op-open-downgrade-start,nfsv4:::op-secinfo-start,nfsv4:::op-readdir-start,nfsv4:::op-setclientid-start,
nfsv4:::op-remove-start,nfsv4:::cb-recall-start,nfsv4:::op-rename-start,nfsv4:::op-commit-start,nfsv4:::op-savefh-start,
nfsv4:::op-delegpurge-start,nfsv4:::op-openattr-start,nfsv4:::op-delegreturn-start,nfsv4:::op-putpubfh-start,
nfsv4:::op-nverify-start,nfsv4:::op-verify-start,nfsv4:::op-setattr-start,nfsv4:::op-getattr-start,
nfsv4:::op-open-confirm-start,nfsv4:::null-start,nfsv4:::op-readlink-start,nfsv4:::op-lockt-start,nfsv4:::op-locku-start,
nfsv4:::op-putrootfh-start,nfsv4:::op-close-start,nfsv4:::op-access-start,nfsv4:::op-getfh-start,nfsv4:::op-read-start,
nfsv4:::op-lock-start,nfsv4:::op-open-start,nfsv4:::op-link-start,nfsv4:::op-renew-start,nfsv4:::op-write-start,
nfsv4:::op-restorefh-start,nfsv4:::op-lookupp-start,nfsv4:::op-release-lockowner-start,nfsv4:::compound-start,
nfsv4:::op-putfh-start,nfsv4:::op-setclientid-confirm-start,nfsv4:::op-lookup-start,nfsv4:::op-create-start'
/args[0]->ci_remote=="10.210.16.12"/
{self->t=timestamp;
printf("%Y %s %s %s\n", walltimestamp, probename, args[0]->ci_remote, args[0]->ci_local);}' \
-n nfsv4:::op-open-downgrade-done,nfsv4:::op-readdir-done,nfsv4:::op-access-done,nfsv4:::op-lock-done,nfsv4:::op-link-done,
nfsv4:::op-setclientid-done,nfsv4:::op-open-confirm-done,nfsv4:::op-open-done,nfsv4:::op-create-done,
nfsv4:::op-secinfo-done,nfsv4:::op-locku-done,nfsv4:::op-close-done,nfsv4:::op-write-done,
nfsv4:::op-delegreturn-done,nfsv4:::op-getattr-done,nfsv4:::op-verify-done,nfsv4:::op-getfh-done,
nfsv4:::op-putfh-done,nfsv4:::op-nverify-done,nfsv4:::op-readlink-done,nfsv4:::op-renew-done,
nfsv4:::op-savefh-done,nfsv4:::op-read-done,nfsv4:::compound-done,nfsv4:::cb-recall-done,
nfsv4:::op-lookup-done,nfsv4:::op-restorefh-done,nfsv4:::op-release-lockowner-done,nfsv4:::op-lookupp-done,
nfsv4:::op-openattr-done,nfsv4:::op-putpubfh-done,nfsv4:::op-putrootfh-done,
nfsv4:::op-setclientid-confirm-done,nfsv4:::op-remove-done,nfsv4:::op-rename-done,nfsv4:::op-commit-done,
nfsv4:::op-delegpurge-done,nfsv4:::op-setattr-done,nfsv4:::op-lockt-done'
/self->t/
{printf("%Y %s %s %s took: %d ret: %d\n", walltimestamp, probename, args[0]->ci_remote, args[0]->ci_local,
(timestamp-self->t)/1000000000, args[2]->status);
self->t=0;}' \
-n nfsv4:::null-done'/self->t/
{printf("%Y %s %s %s took: %d\n", walltimestamp, probename, args[0]->ci_remote, args[0]->ci_local,
(timestamp-self->t)/1000000000);
self->t=0;}' \
-o /var/tmp/nfs4_all_ops.log
# /var/tmp/ss &
</pre>
</div>
</div>
The list of running jobs on the nfs server is:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;"># jobs
[2] Running tcpdump -i aggr1 -w /var/tmp/tcpdump.20191119.out host 10.210.16.12 &
[3] Running /var/tmp/ss &
[4]- Running dtrace -q -n nfsv4:::op-renew-start'{self->t=timestamp; printf("%Y %s client_id: 0x%x noi_xid: %u %s %d:%d %s\n", walltimestamp, probename, args[2]->clientid, args[1]->noi_xid, args[0]->ci_remote, args[1]->noi_cred->cr_uid, args[1]->noi_cred->cr_gid, args[1]->noi_curpath);}' -n nfsv4:::op-renew-done'/self->t/{printf("%Y %s noi_xid: %u %s %s %d:%d %s ret: %d took: %d seconds\n", walltimestamp, probename, args[1]->noi_xid, args[0]->ci_remote, args[0]->ci_local, args[1]->noi_cred->cr_uid, args[1]->noi_cred->cr_gid, args[1]->noi_curpath, args[2]->status, (timestamp-self->t)/1000000000);self->t=0;}' -o /var/tmp/nfs4-renew.log & (wd: /var/log/smblog)
[5]+ Running dtrace -q -n fbt::rfs4_lease_expired:entry'{self->cp=args[0];}' -n fbt::rfs4_lease_expired:return'/arg1 && self->cp/{this->s=(struct sockaddr_in *)&self->cp->rc_addr; printf("%Y %s client_ip: %s client_id: %x forced_expire: %d last_access: %u dt: %d rfs4_lease_time: %d ret: %d\n", walltimestamp, probefunc, inet_ntop(this->s->sin_family, (void *)&this->s->sin_addr), self->cp->rc_clientid, self->cp->rc_forced_expire, self->cp->rc_last_access, (walltimestamp/1000000000)-self->cp->rc_last_access, `rfs4_lease_time, arg1);self->cp=0;stack()}' -n fbt::rfs4_lease_expired:return'/self->cp/{self->cp=0;}' -o /var/tmp/rfs4_lease_expired_failed.out &
</pre>
</div>
</div>
These have been running without production traffic, so there is little going on there.<br />
Still client needs to renew its nfsv4 session as long as the filesystem is mounted, etc.<br />
First let's see how often the client issues RENEW operations and when it fails:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;"># grep "op-renew-done.*10.210.16.12" /var/tmp/nfs4-renew.log | less
...
2019 Nov 19 17:30:01 op-renew-done noi_xid: 807596244 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 19 17:31:01 op-renew-done noi_xid: 824373460 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 19 17:31:01 op-renew-done noi_xid: 2357017293 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 19 17:32:02 op-renew-done noi_xid: 841150676 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 19 17:32:02 op-renew-done noi_xid: 2373794509 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 19 17:33:02 op-renew-done noi_xid: 857927892 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 19 17:33:02 op-renew-done noi_xid: 2390571725 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 19 17:34:02 op-renew-done noi_xid: 2407348941 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 19 17:34:02 op-renew-done noi_xid: 874705108 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 19 17:35:02 op-renew-done noi_xid: 2424126157 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 19 17:35:02 op-renew-done noi_xid: 891482324 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 19 17:36:02 op-renew-done noi_xid: 2440903373 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 19 17:36:02 op-renew-done noi_xid: 908259540 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 19 17:37:02 op-renew-done noi_xid: 925036756 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 19 17:37:02 op-renew-done noi_xid: 2457680589 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 19 17:38:02 op-renew-done noi_xid: 2474457805 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 19 17:38:02 op-renew-done noi_xid: 941813972 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 19 17:39:03 op-renew-done noi_xid: 958591188 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 19 17:39:03 op-renew-done noi_xid: 2491235021 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 19 17:40:57 op-renew-done noi_xid: 796801741 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 17:40:57 op-renew-done noi_xid: 813578957 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 17:40:57 op-renew-done noi_xid: 3491950804 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 17:40:57 op-renew-done noi_xid: 3508728020 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 17:41:02 op-renew-done noi_xid: 863910605 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 19 17:41:02 op-renew-done noi_xid: 3559059668 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
...
</pre>
</div>
</div>
It's been issuing the RENEW ops at every 60s but then there is a gap of 114s (17:39:03 - 17:40:57) which is more than 90s requested by the server, so the server replies with 10011 (NFS4ERR_EXPIRED). If now an open() happened it would have failed as this client doesn't seem to handle NFS4ERR_RENEW returned by open. Notice that it hits the issue for both server IP addresses at the same time (both served by the same nfs server), that's why there are two RENEW calls. It then starts working again and it starts sending new RENEW calls regularly. It must have issued setclientid to fix the state though, let's check it in the other log with all nfs operations:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 300em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;"># less /var/tmp/nfs4_all_ops.log
...
2019 Nov 19 17:38:02 op-renew-start 10.210.16.12 10.210.6.6
2019 Nov 19 17:38:02 op-renew-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 19 17:39:03 compound-start 10.210.16.12 10.210.6.6
2019 Nov 19 17:39:03 op-renew-start 10.210.16.12 10.210.6.6
2019 Nov 19 17:39:03 op-renew-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 19 17:39:03 compound-start 10.210.16.12 10.210.6.5
2019 Nov 19 17:39:03 op-renew-start 10.210.16.12 10.210.6.5
2019 Nov 19 17:39:03 op-renew-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
... many getattr, access, etc. operations
2019 Nov 19 17:40:57 op-renew-start 10.210.16.12 10.210.6.5
2019 Nov 19 17:40:57 op-renew-done 10.210.16.12 10.210.6.5 took: 0 ret: 10011
2019 Nov 19 17:40:57 compound-start 10.210.16.12 10.210.6.5
2019 Nov 19 17:40:57 op-renew-start 10.210.16.12 10.210.6.5
2019 Nov 19 17:40:57 op-renew-done 10.210.16.12 10.210.6.5 took: 0 ret: 10011
2019 Nov 19 17:40:57 compound-start 10.210.16.12 10.210.6.6
2019 Nov 19 17:40:57 op-setclientid-confirm-start 10.210.16.12 10.210.6.6
2019 Nov 19 17:40:57 compound-start 10.210.16.12 10.210.6.5
2019 Nov 19 17:40:57 op-setclientid-confirm-start 10.210.16.12 10.210.6.5
2019 Nov 19 17:40:57 op-setclientid-confirm-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 19 17:40:57 compound-start 10.210.16.12 10.210.6.6
2019 Nov 19 17:40:57 op-renew-start 10.210.16.12 10.210.6.6
2019 Nov 19 17:40:57 op-renew-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
2019 Nov 19 17:40:57 compound-start 10.210.16.12 10.210.6.6
2019 Nov 19 17:40:57 op-renew-start 10.210.16.12 10.210.6.6
2019 Nov 19 17:40:57 op-renew-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
2019 Nov 19 17:40:57 compound-start 10.210.16.12 10.210.6.6
2019 Nov 19 17:40:57 op-setclientid-start 10.210.16.12 10.210.6.6
2019 Nov 19 17:40:57 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 19 17:40:57 op-setclientid-confirm-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 19 17:40:57 compound-start 10.210.16.12 10.210.6.5
2019 Nov 19 17:40:57 op-setclientid-start 10.210.16.12 10.210.6.5
2019 Nov 19 17:40:57 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 19 17:41:02 compound-start 10.210.16.12 10.210.6.5
2019 Nov 19 17:41:02 op-renew-start 10.210.16.12 10.210.6.5
2019 Nov 19 17:41:02 op-renew-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 19 17:41:02 compound-start 10.210.16.12 10.210.6.6
2019 Nov 19 17:41:02 op-renew-start 10.210.16.12 10.210.6.6
2019 Nov 19 17:41:02 op-renew-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 19 17:42:03 compound-start 10.210.16.12 10.210.6.5
2019 Nov 19 17:42:03 op-renew-start 10.210.16.12 10.210.6.5
2019 Nov 19 17:42:03 op-renew-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
...
</pre>
</div>
</div>
Yes, it did issue setclientid (for both 10.210.6.56 and .6).<br />
Let's see how often it issues the setclientid:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 300em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;"># ggrep op-setclientid-done /var/tmp/nfs4_all_ops.log
2019 Nov 19 13:11:00 op-setclientid-done 10.210.16.12 took: 0 ret: 0
2019 Nov 19 13:40:55 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 19 14:40:56 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 19 15:11:05 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 19 15:40:54 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 19 15:40:54 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 19 16:40:48 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 19 16:40:48 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 19 17:40:57 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 19 17:40:57 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 19 18:40:55 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 19 18:40:55 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 19 19:40:57 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 19 19:40:57 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 19 20:11:03 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 19 20:11:03 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 19 20:40:51 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 19 20:40:51 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 19 21:10:49 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 19 21:10:49 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 19 22:40:59 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 19 22:40:59 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 19 23:11:08 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 19 23:11:08 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 19 23:40:58 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 19 23:40:58 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 20 00:41:06 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 20 00:41:06 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 20 01:11:02 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 20 01:11:02 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 20 02:10:54 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 20 02:10:54 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 20 03:10:59 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 20 03:10:59 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 20 03:41:06 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 20 03:41:06 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 20 04:11:15 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 20 04:11:15 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 20 04:41:00 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 20 04:41:00 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 20 05:11:03 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 20 05:11:03 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 20 05:40:59 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 20 05:40:59 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 20 06:11:09 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 20 06:11:09 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 20 06:41:05 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 20 06:41:05 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 20 07:11:05 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 20 07:11:05 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 20 07:40:58 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 20 07:40:58 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 20 08:10:49 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 20 08:10:49 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 20 10:41:18 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 20 10:41:18 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 20 11:11:23 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0
2019 Nov 20 11:11:23 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
</pre>
</div>
</div>
It issues the setclientid calls either every 1hr or at about every 30 minutes intervals.<br />
Let's see if it corresponds to RENEW called which got 10011 (NFS4ERR_EXPIRED):
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 300em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;"># grep "op-renew-done.*10.210.16.12.*10011" /var/tmp/nfs4-renew.log
...
2019 Nov 19 13:11:00 op-renew-done noi_xid: 3380034253 10.210.16.12 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 13:11:00 op-renew-done noi_xid: 3363257037 10.210.16.12 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 13:40:55 op-renew-done noi_xid: 2205694669 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 13:40:55 op-renew-done noi_xid: 2222471885 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 14:40:56 op-renew-done noi_xid: 3866704589 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 14:40:56 op-renew-done noi_xid: 3883481805 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 15:11:05 op-renew-done noi_xid: 2709142221 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 15:11:05 op-renew-done noi_xid: 2725919437 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 15:40:54 op-renew-done noi_xid: 1232812749 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 15:40:54 op-renew-done noi_xid: 1249589965 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 15:40:54 op-renew-done noi_xid: 4196397268 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 15:40:54 op-renew-done noi_xid: 4213174484 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 16:40:48 op-renew-done noi_xid: 1713500372 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 16:40:48 op-renew-done noi_xid: 1696723156 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 16:40:48 op-renew-done noi_xid: 3162258125 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 16:40:48 op-renew-done noi_xid: 3179035341 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 17:40:57 op-renew-done noi_xid: 796801741 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 17:40:57 op-renew-done noi_xid: 813578957 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 17:40:57 op-renew-done noi_xid: 3491950804 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 17:40:57 op-renew-done noi_xid: 3508728020 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 18:40:55 op-renew-done noi_xid: 2407480013 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 18:40:55 op-renew-done noi_xid: 2424257229 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 18:40:55 op-renew-done noi_xid: 690286804 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 18:40:55 op-renew-done noi_xid: 673509588 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 19:40:57 op-renew-done noi_xid: 4018158285 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 19:40:57 op-renew-done noi_xid: 2149970132 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 19:40:57 op-renew-done noi_xid: 2166747348 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 19 19:40:57 op-renew-done noi_xid: 4034935501 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds
...
</pre>
</div>
</div>
It does. So every ~30 minutes or ~1hr the client doesn't issuew RENEW ontime and has to call SETCLIENTID. It seems that if during the window there is an OPEN issued it will get NFS4ERR_EXPIRED and fail as Linux doesn't seem to issue RENEW and then retry the OPEN. I think the file can't be already opened and attributes can't be cached to hit the issue.<br>
Now the question is why Linux is late here to issue RENEW sometimes.<br>
<br>
I re-enabled application and it failed, while stap on the nfs client recorded:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 300em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;"># less /var/tmp/failed_stats_and_opens.log
...
Wed Nov 20 16:41:16 2019 java[10562] uid: 10203 SyS_newlstat("/mnt/.../XXX.zip") = 0 : 0
Wed Nov 20 16:41:16 2019 java[10562] uid: 10203 SyS_newstat("/mnt/.../XXX.zip") = 0 : 0
Wed Nov 20 16:41:16 2019 java[10562] uid: 10203 SyS_newstat("/mnt/.../XXX.zip") = 0 : 0
Wed Nov 20 16:41:16 2019 java[10562] uid: 10203 SyS_newstat("/mnt/.../XXX.zip") = 0 : 0
Wed Nov 20 16:41:16 2019 java[10562] uid: 10203 SyS_open("/mnt/.../.zip") = -5 (EIO) : -5
...
</pre>
</div>
</div>
From the debug log on the nfs client (for the 16:41:51 case):
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;">
Nov 20 16:41:51 xxx kernel: [702405.312031] NFS: open file(.../XXX.zip)
Nov 20 16:41:51 xxx kernel: [702405.312033] --> nfs_put_client({23})
Nov 20 16:41:51 xxx kernel: [702405.312051] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=1024
Nov 20 16:41:51 xxx kernel: [702405.312053] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
Nov 20 16:41:51 xxx kernel: [702405.312212] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
Nov 20 16:41:51 xxx kernel: [702405.312227] nfs4_map_errors could not handle NFSv4 error 10011
Nov 20 16:41:51 xxx kernel: [702405.312233] NFS: atomic_open(0:109/65275), XXX.zip
Nov 20 16:41:51 xxx kernel: [702405.312268] --> nfs_put_client({23})
Nov 20 16:41:51 xxx kernel: [702405.312297] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=1024
Nov 20 16:41:51 xxx kernel: [702405.312299] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
Nov 20 16:41:51 xxx kernel: [702405.312468] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
Nov 20 16:41:51 xxx kernel: [702405.312478] nfs4_map_errors could not handle NFSv4 error 10011
</pre>
</div>
</div>
On the nfs server:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;"># grep '20 16:[34].*-done.*10.210.6.6' /var/tmp/nfs4_all_ops.log | less
...
2019 Nov 20 16:41:51 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
...
2019 Nov 20 16:41:51 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
...
</pre>
</div>
</div>
Let's see renew operations:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 300em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;"># ggrep "op-renew-done.*10.210.16.12 10.210.6.6" /var/tmp/nfs4-renew.log | ggrep '20 16:[234]'
2019 Nov 20 16:20:46 op-renew-done noi_xid: 2640048340 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 20 16:21:46 op-renew-done noi_xid: 2656825556 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 20 16:22:46 op-renew-done noi_xid: 2673602772 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 20 16:23:46 op-renew-done noi_xid: 2690379988 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 20 16:26:18 op-renew-done noi_xid: 3479630036 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 20 16:27:19 op-renew-done noi_xid: 3496407252 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 20 16:28:19 op-renew-done noi_xid: 3513184468 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 20 16:29:19 op-renew-done noi_xid: 3529961684 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 20 16:30:19 op-renew-done noi_xid: 3546738900 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 20 16:31:19 op-renew-done noi_xid: 3563516116 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 20 16:32:19 op-renew-done noi_xid: 3580293332 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 20 16:33:19 op-renew-done noi_xid: 3597070548 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 20 16:34:20 op-renew-done noi_xid: 3613847764 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 20 16:35:20 op-renew-done noi_xid: 3630624980 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 20 16:36:20 op-renew-done noi_xid: 3647402196 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 20 16:37:20 op-renew-done noi_xid: 3664179412 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 20 16:38:20 op-renew-done noi_xid: 3680956628 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 20 16:42:16 op-renew-done noi_xid: 3932811476 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 20 16:42:16 op-renew-done noi_xid: 3949588692 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 20 16:44:46 op-renew-done noi_xid: 1438445780 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 20 16:45:46 op-renew-done noi_xid: 1455222996 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 20 16:46:46 op-renew-done noi_xid: 1472000212 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 20 16:47:46 op-renew-done noi_xid: 1488777428 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 20 16:48:46 op-renew-done noi_xid: 1505554644 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
2019 Nov 20 16:49:46 op-renew-done noi_xid: 1522331860 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds
# grep '20 1[56]:.*setclie.*10.210.6.6' /var/tmp/nfs4_all_ops.log
2019 Nov 20 15:11:09 op-setclientid-start 10.210.16.12 10.210.6.6
2019 Nov 20 15:11:09 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 20 15:11:09 op-setclientid-confirm-start 10.210.16.12 10.210.6.6
2019 Nov 20 15:11:09 op-setclientid-confirm-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 20 15:40:49 op-setclientid-start 10.210.16.12 10.210.6.6
2019 Nov 20 15:40:49 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 20 15:40:49 op-setclientid-confirm-start 10.210.16.12 10.210.6.6
2019 Nov 20 15:40:49 op-setclientid-confirm-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 20 16:42:16 op-setclientid-start 10.210.16.12 10.210.6.6
2019 Nov 20 16:42:16 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
2019 Nov 20 16:42:16 op-setclientid-confirm-start 10.210.16.12 10.210.6.6
2019 Nov 20 16:42:16 op-setclientid-confirm-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
</pre>
</div>
</div>
And the client recovery from its debug log:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;">
Nov 20 16:42:16 xxx kernel: [702430.126654] nfs4_renew_state: start
Nov 20 16:42:16 xxx kernel: [702430.126677] nfs4_renew_state: done
Nov 20 16:42:16 xxx kernel: [702430.245186] nfs4_schedule_lease_recovery: scheduling lease recovery for server xxx
Nov 20 16:42:16 xxx kernel: [702430.247174] nfs4_schedule_state_renewal: requeueing work. Lease period = 30
Nov 20 16:42:16 xxx kernel: [702430.247177] --> nfs_put_client({24})
Nov 20 16:42:16 xxx kernel: [702430.247817] nfs4_recovery_handle_error: handled error -10011 for server xxx
Nov 20 16:42:16 xxx kernel: [702430.247821] NFS call setclientid auth=RPCSEC_GSS, 'Linux NFSv4.0 xxx/10.210.6.6'
Nov 20 16:42:16 xxx kernel: [702430.248233] NFS reply setclientid: 0
Nov 20 16:42:16 xxx kernel: [702430.248235] NFS call setclientid_confirm auth=RPCSEC_GSS, (client ID ad23bf2032c90f)
Nov 20 16:42:16 xxx kernel: [702430.257473] NFS reply setclientid_confirm: 0
</pre>
</div>
</div>
This is a gap of 236s (well over the 90s) between 16:38:20 and 16:42:16 and all open errors happened only during this exact time period:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;"># grep '20 1[56]:.*open-done.*10.210.6.6' /var/tmp/nfs4_all_ops.log | grep 10011
2019 Nov 20 16:40:36 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
2019 Nov 20 16:40:36 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
2019 Nov 20 16:41:15 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
2019 Nov 20 16:41:15 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
2019 Nov 20 16:41:16 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
2019 Nov 20 16:41:16 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
2019 Nov 20 16:41:17 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
2019 Nov 20 16:41:16 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
2019 Nov 20 16:41:17 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
2019 Nov 20 16:41:16 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
2019 Nov 20 16:41:16 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
2019 Nov 20 16:41:16 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
2019 Nov 20 16:41:17 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
2019 Nov 20 16:41:17 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
2019 Nov 20 16:41:22 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
2019 Nov 20 16:41:22 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
2019 Nov 20 16:41:22 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
2019 Nov 20 16:41:22 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
2019 Nov 20 16:41:30 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
2019 Nov 20 16:41:30 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
2019 Nov 20 16:41:32 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
2019 Nov 20 16:41:31 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
2019 Nov 20 16:41:32 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011</pre>
</div>
</div>
nfs4renewd.c /nfs4_renew_state() prints “failed to call renewd. Reason: lease not expired” and later dprintk("%s: requeueing work. Lease period = %ld\n", func, (timeout + HZ - 1) / HZ);<br>
Prints 57s, this suggests 3s before fsinfo() was run on the server, if solaris wouldn’t renew lease on fsinfo() then RENEW wouldn’t be send by nfs4_renew_state() and the client would end up without a lease and would have to renew.<br>
I used stap to confirm it.
<div style="-webkit-text-stroke-width: 0px; background-color: #f4f5f7; color: #172b4d; font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Oxygen, Ubuntu, "Fira Sans", "Droid Sans", "Helvetica Neue", sans-serif; font-size: 14px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; margin: 10px 0px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
nfs4proc.c /nfs4_do_fsinfo()
nfs4_do_open() -> _nfs4_do_open() -> nfs4_recover_expired_lease() -> nfs4_client_recover_expired_lease() -> test_bit(NFS4CLNT_LEASE_EXPIRED) 10 times in a loop, if not set return EIO<br />
</div>
The only thing which will set the bit is nfs4renewd.c/nfs4_renew_state() but if something else is modifying clp->cl_last_renewal and casing nfs4_renew_state() not to set the bit, then open() won’t recover. We do see the dprintk("%s: requeueing work. Lease period = %ld\n"… printed by nfs4_schedule_state_renewal() with value of 57 (and others) which means 3s before something updated cl_last_renewal.<br>
Now nfs4_do_fsinfo() calls nfs4_set_lease_period().
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 300em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;">static struct nfs4_state *nfs4_do_open(struct inode *dir,
struct nfs_open_context *ctx,
int flags,
struct iattr *sattr,
struct nfs4_label *label,
int *opened)
{
struct nfs_server *server = NFS_SERVER(dir);
struct nfs4_exception exception = { };
struct nfs4_state *res;
int status;
do {
status = _nfs4_do_open(dir, ctx, flags, sattr, label, opened);
res = ctx->state;
trace_nfs4_open_file(ctx, flags, status);
if (status == 0)
break;
…
return res;
}
</pre>
</div>
</div>
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;">static int _nfs4_do_open(struct inode *dir,
struct nfs_open_context *ctx,
int flags,
struct iattr *sattr,
struct nfs4_label *label,
int *opened)
{
…
status = nfs4_recover_expired_lease(server);
if (status != 0)
goto err_put_state_owner;
…
err_put_state_owner:
nfs4_put_state_owner(sp);
out_err:
return status;
}
</pre>
</div>
</div>
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;">static int nfs4_recover_expired_lease(struct nfs_server *server)
{
return nfs4_client_recover_expired_lease(server->nfs_client);
}
nfs4state.c:
int nfs4_client_recover_expired_lease(struct nfs_client *clp)
{
unsigned int loop;
int ret;
for (loop = NFS4_MAX_LOOP_ON_RECOVER; loop != 0; loop--) {
ret = nfs4_wait_clnt_recover(clp);
if (ret != 0)
break;
if (!test_bit(NFS4CLNT_LEASE_EXPIRED, &clp->cl_state) &&
!test_bit(NFS4CLNT_CHECK_LEASE,&clp->cl_state))
break;
nfs4_schedule_state_manager(clp);
ret = -EIO;
}
return ret;
}
</pre>
</div>
</div>
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;">$ grep -r NFS4_MAX_LOOP_ON_RECOVER . | grep define
./nfs4_fs.h:#define NFS4_MAX_LOOP_ON_RECOVER (10)
</pre>
</div>
</div>
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 300em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;">nfs4renewd.c:
nfs4_renew_state(struct work_struct *work)
{
const struct nfs4_state_maintenance_ops *ops;
struct nfs_client *clp =
container_of(work, struct nfs_client, cl_renewd.work);
struct rpc_cred *cred;
long lease;
unsigned long last, now;
unsigned renew_flags = 0;
ops = clp->cl_mvops->state_renewal_ops;
dprintk("%s: start\n", __func__);
if (test_bit(NFS_CS_STOP_RENEW, &clp->cl_res_state))
goto out;
spin_lock(&clp->cl_lock);
lease = clp->cl_lease_time;
last = clp->cl_last_renewal;
now = jiffies;
/* Are we close to a lease timeout? */
if (time_after(now, last + lease/3))
renew_flags |= NFS4_RENEW_TIMEOUT;
if (nfs_delegations_present(clp))
renew_flags |= NFS4_RENEW_DELEGATION_CB;
if (renew_flags != 0) {
cred = ops->get_state_renewal_cred_locked(clp);
spin_unlock(&clp->cl_lock);
if (cred == NULL) {
if (!(renew_flags & NFS4_RENEW_DELEGATION_CB)) {
set_bit(NFS4CLNT_LEASE_EXPIRED, &clp->cl_state);
goto out;
}
nfs_expire_all_delegations(clp);
} else {
int ret;
/* Queue an asynchronous RENEW. */
ret = ops->sched_state_renewal(clp, cred, renew_flags);
put_rpccred(cred);
switch (ret) {
default:
goto out_exp;
case -EAGAIN:
case -ENOMEM:
break;
}
}
} else {
dprintk("%s: failed to call renewd. Reason: lease not expired \n",
__func__);
spin_unlock(&clp->cl_lock);
}
nfs4_schedule_state_renewal(clp);
out_exp:
nfs_expire_unreferenced_delegations(clp);
out:
dprintk("%s: done\n", __func__);
}
void
nfs4_schedule_state_renewal(struct nfs_client *clp)
{
long timeout;
spin_lock(&clp->cl_lock);
timeout = (2 * clp->cl_lease_time) / 3 + (long)clp->cl_last_renewal
- (long)jiffies;
if (timeout < 5 * HZ)
timeout = 5 * HZ;
dprintk("%s: requeueing work. Lease period = %ld\n",
__func__, (timeout + HZ - 1) / HZ);
mod_delayed_work(system_wq, &clp->cl_renewd, timeout);
set_bit(NFS_CS_RENEWD, &clp->cl_res_state);
spin_unlock(&clp->cl_lock);
}
/**
* nfs4_set_lease_period - Sets the lease period on a nfs_client
*
* @clp: pointer to nfs_client
* @lease: new value for lease period
* @lastrenewed: time at which lease was last renewed
*/
void nfs4_set_lease_period(struct nfs_client *clp,
unsigned long lease,
unsigned long lastrenewed)
{
spin_lock(&clp->cl_lock);
clp->cl_lease_time = lease;
clp->cl_last_renewal = lastrenewed;
spin_unlock(&clp->cl_lock);
/* Cap maximum reconnect timeout at 1/2 lease period */
rpc_cap_max_reconnect_timeout(clp->cl_rpcclient, lease >> 1);
}
</pre>
</div>
</div>
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;">static int nfs4_do_fsinfo(struct nfs_server *server, struct nfs_fh *fhandle, struct nfs_fsinfo *fsinfo)
{
struct nfs4_exception exception = { };
unsigned long now = jiffies;
int err;
do {
err = _nfs4_do_fsinfo(server, fhandle, fsinfo);
trace_nfs4_fsinfo(server, fhandle, fsinfo->fattr, err);
if (err == 0) {
nfs4_set_lease_period(server->nfs_client,
fsinfo->lease_time * HZ,
now);
break;
}
err = nfs4_handle_exception(server, err, &exception);
} while (exception.retry);
return err;
}
</pre>
</div>
</div>
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;">static int _nfs4_do_fsinfo(struct nfs_server *server, struct nfs_fh *fhandle,
struct nfs_fsinfo *fsinfo)
{
struct nfs4_fsinfo_arg args = {
.fh = fhandle,
.bitmask = server->attr_bitmask,
};
struct nfs4_fsinfo_res res = {
.fsinfo = fsinfo,
};
struct rpc_message msg = {
.rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_FSINFO],
.rpc_argp = &args,
.rpc_resp = &res,
};
return nfs4_call_sync(server->client, server, &msg, &args.seq_args, &res.seq_res, 0);
}
</pre>
</div>
</div>
Many nfs4 ops reset lease as well, so they assume on successful op server would do the same.<br />
For example:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;">static void do_renew_lease(struct nfs_client *clp, unsigned long timestamp)
{
spin_lock(&clp->cl_lock);
if (time_before(clp->cl_last_renewal,timestamp))
clp->cl_last_renewal = timestamp;
spin_unlock(&clp->cl_lock);
}
static void renew_lease(const struct nfs_server *server, unsigned long timestamp)
{
struct nfs_client *clp = server->nfs_client;
if (!nfs4_has_session(clp))
do_renew_lease(clp, timestamp);
}
}
static void nfs4_open_confirm_done(struct rpc_task *task, void *calldata)
{
struct nfs4_opendata *data = calldata;
nfs40_sequence_done(task, &data->c_res.seq_res);
data->rpc_status = task->tk_status;
if (data->rpc_status == 0) {
nfs4_stateid_copy(&data->o_res.stateid, &data->c_res.stateid);
nfs_confirm_seqid(&data->owner->so_seqid, 0);
renew_lease(data->o_res.server, data->timestamp);
data->rpc_done = 1;
}
}
static void nfs4_open_done(struct rpc_task *task, void *calldata)
{
struct nfs4_opendata *data = calldata;
data->rpc_status = task->tk_status;
if (!nfs4_sequence_process(task, &data->o_res.seq_res))
return;
if (task->tk_status == 0) {
if (data->o_res.f_attr->valid & NFS_ATTR_FATTR_TYPE) {
switch (data->o_res.f_attr->mode & S_IFMT) {
case S_IFREG:
break;
case S_IFLNK:
data->rpc_status = -ELOOP;
break;
case S_IFDIR:
data->rpc_status = -EISDIR;
break;
default:
data->rpc_status = -ENOTDIR;
}
}
renew_lease(data->o_res.server, data->timestamp);
if (!(data->o_res.rflags & NFS4_OPEN_RESULT_CONFIRM))
nfs_confirm_seqid(&data->owner->so_seqid, 0);
}
data->rpc_done = 1;
}
</pre>
</div>
</div>
Let’s see what is calling nfs4_do_fsinfo():
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;"># cat ./nfs4_do_fsinfo.stp
#!/usr/bin/env stap
/*
module("nfsv4").function("nfs4_do_fsinfo@fs/nfs/nfs4proc.c:4504") $server:struct nfs_server* $fhandle:struct nfs_fh* $fsinfo:struct nfs_fsinfo* $exception:struct nfs4_exception
*/
probe module("nfsv4").function("nfs4_do_fsinfo") {
printf("%s %s proc: %s[%d] uid: %d\n", ctime(gettimeofday_s()), probefunc(), execname(), pid(), uid())
print_backtrace()
exit()
}
</pre>
</div>
</div>
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 300em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;"># ./nfs4_do_fsinfo.stp
WARNING: Missing unwind data for a module, rerun with 'stap -d nfs'
Thu Nov 21 16:39:53 2019 nfs4_do_fsinfo proc: ls[18701] uid: 10006
0xffffffffc05de260 : nfs4_do_fsinfo+0x0/0x120 [nfsv4]
0xffffffffc05de3b1 : nfs4_proc_fsinfo+0x31/0x60 [nfsv4]
0xffffffffc05924e0 [nfs]
0xffffffffc0592b43 [nfs]
0xffffffffc059f098 [nfs]
0xffffffff9104d1fe
0xffffffff9106b377
0xffffffff9106b454
0xffffffffc05ab014 [nfs]
0xffffffffc05fb740 : nfs4_submount+0x90/0x830 [nfsv4]
0xffffffffc05ab21a [nfs]
0xffffffff91053a69
0xffffffff910549ad
0xffffffff91058483
0xffffffff9105878b
0xffffffff91058cea
0xffffffff9105954b
0xffffffff9105b497
0xffffffff9105b501
0xffffffff9104e1e3
# ./nfs4_do_fsinfo.stp
WARNING: Missing unwind data for a module, rerun with 'stap -d nfs'
Thu Nov 21 17:10:05 2019 nfs4_do_fsinfo proc: ls[29463] uid: 10006
0xffffffffc05de260 : nfs4_do_fsinfo+0x0/0x120 [nfsv4]
0xffffffffc05de3b1 : nfs4_proc_fsinfo+0x31/0x60 [nfsv4]
0xffffffffc05924e0 [nfs]
0xffffffffc0592b43 [nfs]
0xffffffffc059f098 [nfs]
0xffffffff9104d1fe
0xffffffff9106b377
0xffffffff9106b454
0xffffffffc05ab014 [nfs]
0xffffffffc05fb740 : nfs4_submount+0x90/0x830 [nfsv4]
0xffffffffc05ab21a [nfs]
0xffffffff91053a69
0xffffffff910549ad
0xffffffff91058483
0xffffffff9105878b
0xffffffff91058cea
0xffffffff9105954b
0xffffffff9105b497
0xffffffff9105b501
0xffffffff9104e1e3
</pre>
</div>
</div>
Let’s get some more info:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;"># cat ./nfs4_do_fsinfo.stp
#!/usr/bin/env stap
/*
module("nfsv4").function("nfs4_do_fsinfo@fs/nfs/nfs4proc.c:4504") $server:struct nfs_server* $fhandle:struct nfs_fh* $fsinfo:struct nfs_fsinfo* $exception:struct nfs4_exception
*/
probe module("nfsv4").function("nfs4_do_fsinfo") {
printf("%s %s proc: %s[%d] ppid: %d args: %s cwd: %s uid: %d\n", ctime(gettimeofday_s()), probefunc(), execname(), pid(),
ppid(), cmdline_str(), fullpath_struct_path(task_cwd_path(task_current())), uid())
print_backtrace()
exit()
}
</pre>
</div>
</div>
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;"># ./nfs4_do_fsinfo.stp
WARNING: Missing unwind data for a module, rerun with 'stap -d nfs'
Thu Nov 21 17:40:05 2019 nfs4_do_fsinfo proc: ls[9275] ppid: 9273 args: ls /mnt/.../ cwd: /... uid: 10006
0xffffffffc05de260 : nfs4_do_fsinfo+0x0/0x120 [nfsv4]
0xffffffffc05de3b1 : nfs4_proc_fsinfo+0x31/0x60 [nfsv4]
0xffffffffc05924e0 [nfs]
0xffffffffc0592b43 [nfs]
0xffffffffc059f098 [nfs]
0xffffffff9104d1fe
0xffffffff9106b377
0xffffffff9106b454
0xffffffffc05ab014 [nfs]
0xffffffffc05fb740 : nfs4_submount+0x90/0x830 [nfsv4]
0xffffffffc05ab21a [nfs]
0xffffffff91053a69
0xffffffff910549ad
0xffffffff91058483
0xffffffff9105878b
0xffffffff91058cea
0xffffffff9105954b
0xffffffff9105b497
0xffffffff9105b501
0xffffffff9104e1e3
</pre>
</div>
</div>
Notice that fsifno is actually a set of GETATTR operations in nfsv4:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;">/*
* FSINFO request
*/
static void nfs4_xdr_enc_fsinfo(struct rpc_rqst *req, struct xdr_stream *xdr,
struct nfs4_fsinfo_arg *args)
{
struct compound_hdr hdr = {
.minorversion = nfs4_xdr_minorversion(&args->seq_args),
};
encode_compound_hdr(xdr, req, &hdr);
encode_sequence(xdr, &args->seq_args, &hdr);
encode_putfh(xdr, args->fh, &hdr);
encode_fsinfo(xdr, args->bitmask, &hdr);
encode_nops(&hdr);
}
</pre>
</div>
</div>
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;">static void encode_fsinfo(struct xdr_stream *xdr, const u32* bitmask, struct compound_hdr *hdr)
{
encode_getattr_three(xdr,
bitmask[0] & nfs4_fsinfo_bitmap[0],
bitmask[1] & nfs4_fsinfo_bitmap[1],
bitmask[2] & nfs4_fsinfo_bitmap[2],
hdr);
}</pre>
</div>
</div>
Let's double check that the nfs4_do_fsinfo() is caused by vfs/fstat() call or similar by ls:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;"># umount /mnt/...
$ ./sh/checkXXXMounts.sh
Checking /mnt/...
...
</pre>
</div>
</div>
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;"># stap -g -d/bin/ls --ldd --all-modules ./nfs4_do_fsinfo.stp
WARNING: missing unwind/symbol data for module 'stap_6d2b184a53eae72ccbc95adec92f2d3c_25253'
Fri Nov 22 10:32:32 2019 nfs4_do_fsinfo proc: ls[27494] ppid: 27492 args: ls /mnt/... cwd: /... uid: 10006
0xffffffffc05de260 : nfs4_do_fsinfo+0x0/0x120 [nfsv4]
0xffffffffc05de3b1 : nfs4_proc_fsinfo+0x31/0x60 [nfsv4]
0xffffffffc05924e0 : nfs_probe_fsinfo+0xc0/0x660 [nfs]
0xffffffffc0592b43 : nfs_clone_server+0xc3/0x1f0 [nfs]
0xffffffffc059f098 : nfs_xdev_mount+0x98/0x130 [nfs]
0xffffffff9104d1fe : mount_fs+0x3e/0x1b0 [kernel]
0xffffffff9106b377 : vfs_kern_mount+0x67/0x110 [kernel]
0xffffffff9106b454 : vfs_submount+0x34/0x40 [kernel]
0xffffffffc05ab014 : nfs_do_submount+0xb4/0x160 [nfs]
0xffffffffc05fb740 : nfs4_submount+0x90/0x830 [nfsv4]
0xffffffffc05ab21a : nfs_d_automount+0xca/0x1c0 [nfs]
0xffffffff91053a69 : follow_managed+0x149/0x310 [kernel]
0xffffffff910549ad : lookup_fast+0x12d/0x230 [kernel]
0xffffffff91058483 : link_path_walk+0x1b3/0x8b0 [kernel]
0xffffffff9105878b : link_path_walk+0x4bb/0x8b0 [kernel]
0xffffffff91058cea : path_lookupat+0x7a/0x8b0 [kernel]
0xffffffff9105954b : filename_lookup+0x2b/0xc0 [kernel]
0xffffffff9105b497 : user_path_at_empty+0x67/0xc0 [kernel]
0xffffffff9105b501 : user_path_at+0x11/0x20 [kernel]
0xffffffff9104e1e3 : vfs_fstatat+0x63/0xc0 [kernel]
-------
0x7fd54c693275 : __xstat64+0x15/0x50 [/usr/lib64/libc-2.17.so]
0x408841 : gobble_file.constprop.50+0x8b1/0xeb0 [/usr/bin/ls]
0x402e41 : main+0x531/0x1a14 [/usr/bin/ls]
0x7fd54c5c6505 : __libc_start_main+0xf5/0x1c0 [/usr/lib64/libc-2.17.so]
0x40434d : _start+0x29/0x2c [/usr/bin/ls]
-------
</pre>
</div>
</div>
Yes, it is. So the<br />
btw: just running ls is enough to trigger it after data is no longer cached or better unmount it first:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;"># umount /mnt/...
# ls -l /mnt/... >/dev/null
# stap -g -d/bin/ls --ldd --all-modules ./nfs4_do_fsinfo.stp
Fri Nov 22 10:37:17 2019 nfs4_do_fsinfo proc: ls[29995] ppid: 13019 args: ls --color=auto -l /mnt/... cwd: /... uid: 0
0xffffffffc05de260 : nfs4_do_fsinfo+0x0/0x120 [nfsv4]
0xffffffffc05de3b1 : nfs4_proc_fsinfo+0x31/0x60 [nfsv4]
0xffffffffc05924e0 : nfs_probe_fsinfo+0xc0/0x660 [nfs]
0xffffffffc0592b43 : nfs_clone_server+0xc3/0x1f0 [nfs]
0xffffffffc059f098 : nfs_xdev_mount+0x98/0x130 [nfs]
0xffffffff9104d1fe : mount_fs+0x3e/0x1b0 [kernel]
0xffffffff9106b377 : vfs_kern_mount+0x67/0x110 [kernel]
0xffffffff9106b454 : vfs_submount+0x34/0x40 [kernel]
0xffffffffc05ab014 : nfs_do_submount+0xb4/0x160 [nfs]
0xffffffffc05fb740 : nfs4_submount+0x90/0x830 [nfsv4]
0xffffffffc05ab21a : nfs_d_automount+0xca/0x1c0 [nfs]
0xffffffff91053a69 : follow_managed+0x149/0x310 [kernel]
0xffffffff910549ad : lookup_fast+0x12d/0x230 [kernel]
0xffffffff91058ddd : path_lookupat+0x16d/0x8b0 [kernel]
0xffffffff9105954b : filename_lookup+0x2b/0xc0 [kernel]
0xffffffff9105b497 : user_path_at_empty+0x67/0xc0 [kernel]
0xffffffff9105b501 : user_path_at+0x11/0x20 [kernel]
0xffffffff9104e1e3 : vfs_fstatat+0x63/0xc0 [kernel]
0xffffffff9104e601 : SYSC_newlstat+0x31/0x60 [kernel]</pre>
</div>
</div>
<br>
So what happens here is that Linux will unmount cross-mountpoints after /proc/sys/fs/nfs/nfs_mountpoint_timeout which on this server has the default value of 500. Next time anything triggers mounting a cross-mouint linux will send fsinfo() call (nfs4_do_fsinfo()).<br>
For now, assuming the nfs4_do_fsinfo() only gets called on mounting cross-mount (not necesarilly true), to test a possible workaround let's modify linux so it wouldn't unmount sub-mounts for 7 days - on linux client:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;"># bc
60*60*24*7
604800
# echo 604800 > /proc/sys/fs/nfs/nfs_mountpoint_timeout
# date
Fri 22 Nov 12:23:27 GMT 2019
</pre>
</div>
</div>
Also running:</div>
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;"># stap -g --ldd --all-modules --suppress-handler-errors -o /var/tmp/nfs4_do_fsinfo.log ./nfs4_do_fsinfo.stp &
[6] 10295</pre>
</div>
</div>
<br>
I've managed to reproduce the issue.
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;"># cat ./nfs4_renew_state.stp
#!/bin/stap
/*
* module("nfsv4").function("nfs4_renew_state@fs/nfs/nfs4renewd.c:55") $work:struct work_struct* $clp:struct nfs_client* $renew_flags:unsigned int $__func__:char[] const
*/
probe module("nfsv4").function("nfs4_renew_state") {
printf("%s %s least_time: %d secs_to_renew: %d server: %s\n", ctime(gettimeofday_s()), probefunc(),
$clp->cl_lease_time/HZ(), (jiffies()-$clp->cl_last_renewal)/HZ(), kernel_string($clp->cl_hostname))
}
</pre>
</div>
</div>
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;"># ./nfs4_renew_state.stp | grep xxx
Fri Nov 22 15:40:41 2019 nfs4_renew_state least_time: 90 secs_to_renew: 60 server: xxx
Fri Nov 22 15:41:42 2019 nfs4_renew_state least_time: 90 secs_to_renew: 5 server: xxx
Fri Nov 22 15:42:36 2019 nfs4_renew_state least_time: 90 secs_to_renew: 60 server: xxx
</pre>
</div>
</div>
Few seconds before 15:40:41+60s I did:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;"># umount /mnt/...; ls -l /mnt/.../
ls: cannot open directory /mnt/.../: Permission denied
</pre>
</div>
</div>
So now Linux is not going to send RENEW and assume its lease got extended.<br />
At 15:40:41+90s solaris server will assume the client's lease expired, so if we call open between 15:42:11 - 15:42:36 then open should fail:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;">$ date;strace -e trace=open cat /mnt/.../xxx >/dev/null
Fri 22 Nov 15:42:13 GMT 2019
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
open("/mnt/.../xxx", O_RDONLY) = -1 EIO (Input/output error)
...
$ date;strace -e trace=open cat /mnt/.../xxx >/dev/null
Fri 22 Nov 15:42:38 GMT 2019
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
open("/mnt/.../xxx", O_RDONLY) = 3
+++ exited with 0 +++
</pre>
</div>
</div>
and it did fail, while when re-run again after linux sent renew then it worked.<br>
Let's confirm solaris did respond with 10011:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;"># ggrep "Nov 22 .*-done.*10.210.16.12" /var/tmp/nfs4-renew.log | grep "ret: 10011" | tail
...
2019 Nov 22 15:42:36 op-renew-done noi_xid: 2258385869 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds
2019 Nov 22 15:42:36 op-renew-done noi_xid: 2275163085 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds
# grep 15:42:13 /var/tmp/nfs4_all_ops.log | grep 10011
2019 Nov 22 15:42:13 op-open-done 10.210.16.12 10.210.6.5 took: 0 ret: 10011
</pre>
</div>
</div>
It did.<br>
<br>
I've also managed to reproduce the same issue against Linux NFS server.
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;"># cat ./nfs4_renew_state.stp
#!/bin/stap
/*
* module("nfsv4").function("nfs4_renew_state@fs/nfs/nfs4renewd.c:55") $work:struct work_struct* $clp:struct nfs_client* $renew_flags:unsigned int $__func__:char[] const
*/
probe module("nfsv4").function("nfs4_renew_state") {
printf("%s %s least_time: %d secs_to_renew: %d server: %s\n", ctime(gettimeofday_s()), probefunc(),
$clp->cl_lease_time/HZ(), (jiffies()-$clp->cl_last_renewal)/HZ(), kernel_string($clp->cl_hostname))
}
</pre>
</div>
</div>
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;"># ./nfs4_renew_state.stp | grep xxx
Mon Nov 25 13:49:36 2019 nfs4_renew_state least_time: 90 secs_to_renew: 40 server: xxx
Mon Nov 25 13:50:36 2019 nfs4_renew_state least_time: 90 secs_to_renew: 60 server: xxx
Mon Nov 25 13:51:36 2019 nfs4_renew_state least_time: 90 secs_to_renew: 3 server: xxx
Mon Nov 25 13:52:33 2019 nfs4_renew_state least_time: 90 secs_to_renew: 60 server: xxx
Mon Nov 25 13:52:38 2019 nfs4_renew_state least_time: 90 secs_to_renew: 65 server: xxx
Mon Nov 25 13:53:38 2019 nfs4_renew_state least_time: 90 secs_to_renew: 60 server: xxx
^C
</pre>
</div>
</div>
At 13:51:32 (4 seconds before client was to send RENEW operation) I run:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;"># date; umount /mnt/...; ls -l /mnt/.../
Mon 25 Nov 13:51:32 GMT 2019
total 7
-rw-r--r-- 1 root root 5 Nov 25 13:48 test_f1
</pre>
</div>
</div>
Then at 13:52:08, 2 seconds after the client's lease expired from server point of view (13:50:36 + 90s = 13:52:06), I run:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;">$ date;strace -e trace=open cat /mnt/.../test_f1 >/dev/null
Mon 25 Nov 13:52:08 GMT 2019
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
open("/mnt/.../test_f1", O_RDONLY) = -1 EIO (Input/output error)
...
</pre>
</div>
</div>
Then, at 13:52:33 (13:51:36 + 57s), the client send RENEW operation, and had to recover via SETCLIENTID, etc. and now open() works fine:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;">$ date;strace -e trace=open cat /mnt/.../test_f1 >/dev/null
Mon 25 Nov 13:53:39 GMT 2019
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
open("/mnt/.../test_f1", O_RDONLY) = 3
+++ exited with 0 +++
</pre>
</div>
</div>
The Linux nfs server behaves the same as Solaris nfs server.<br />
The linux is CentOS 7.4 running 3.10.0-693.11.6.el7.x86_64.<br>
Also reproduced against Solaris 11.4.15.5.0 NFS server.<br>
<br>
A workaround on Linux nfs client hosts is to increase sysctl/fs.nfs.nfs_mountpoint_timeout from 500 (default) to max allowed value: 2147483.<br>
<br>
I couldn't reproduce the issue against NetApp filers though, some clarification from NetApp support:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 300em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;">Hi Robert,
I have gone through this case and your request for explanation on why ONTAP is not throwing “NFS4ERR_EXPIRED” when the lease is expired while other servers does that.
Why the Client did not send the RENEW call is client’s behavior and ONTAP cannot control that.
Now on why ONTAP is not throwing “NFS4ERR_EXPIRED” even after lease is expired (30 secs), I believe the RFC has the answer for that.
This section explains the behavior of other servers you have tested.
https://tools.ietf.org/html/rfc7530#section-9.6.3
9.6.3. Network Partitions and Recovery
If this occurs, the server may cancel
the lease and free all locks held for the client.
The following section explains the ONTAP’s behavior.
https://tools.ietf.org/html/rfc7530#section-9.6.3.1
9.6.3.1. Courtesy Locks
As a courtesy to the client or as an optimization, the server may
continue to hold locks, including delegations, on behalf of a client
for which recent communication has extended beyond the lease period,
delaying the cancellation of the lease.
This section clarifies that both the behaviors can be implemented.
https://tools.ietf.org/html/rfc7530#section-9.6.3.2
9.6.3.2. Lease Cancellation
As a result of lease expiration, leases may be canceled, either
immediately upon expiration or subsequently, depending on the
occurrence of a conflicting lock or extension of the period of
partition beyond what the server will tolerate.
I believe ONTAP is behaving according to the RFC and in more optimized way.
In that case, please let me know if you need any further explanation from our side on the same.
...
</pre>
</div>
</div>
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 300em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;">Hey Robert,
We tested in the lab thoroughly to check for how long we keep the courtesy locks.
The courtesy locks will be kept for 2*lease time, if there is no conflicting request from any other client for the same file.
Please note that it might take some more time (40-50 secs) to clear those states from memory as the thread runs periodically to clean that.
Yes, they are tunable. As the time depends on lease-time, you can tweak that for a particular NFS server to control the courtesy lock time.
...
</pre>
</div>
</div>
Knowing now more how ONTAP behaves I re-created the issue as well, although the behavior is different.<br>
It is easier to cause do_fsinfo() every 10s as it will permanently stop linux client from sending RENEW operation.
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;"># while [ 1 ] ; do date; umount /mnt/...; ls -l /mnt/.../; sleep 10; done
</pre>
</div>
</div>
Few minutes later I run:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;">$ date; strace -e trace=open cat /mnt/.../f1
Fri 6 Dec 10:52:53 GMT 2019
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
open("/mnt/.../f1", O_RDONLY) = 3
test
+++ exited with 0 +++
</pre>
</div>
</div>
Notice that the open did not fail. Network dump:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 300em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;"> 34 3.656314120 10.210.16.12 -> 10.210.196.134 NFS 322 V4 Call OPEN DH: 0x6d8f1235/f1
35 3.861180170 10.210.196.134 -> 10.210.16.12 NFS 150 V4 Reply (Call In 34) OPEN Status: NFS4ERR_STALE_CLIENTID
36 3.861352838 10.210.16.12 -> 10.210.196.134 NFS 230 V4 Call RENEW CID: 0x5b6f
37 4.066210651 10.210.196.134 -> 10.210.16.12 NFS 182 V4 Reply (Call In 36) RENEW Status: NFS4ERR_STALE_CLIENTID
38 4.066357237 10.210.16.12 -> 10.210.196.134 NFS 338 V4 Call SETCLIENTID
39 4.271134903 10.210.196.134 -> 10.210.16.12 NFS 198 V4 Reply (Call In 38) SETCLIENTID
40 4.271256568 10.210.16.12 -> 10.210.196.134 NFS 238 V4 Call SETCLIENTID_CONFIRM
41 4.476085500 10.210.196.134 -> 10.210.16.12 NFS 182 V4 Reply (Call In 40) SETCLIENTID_CONFIRM
44 4.476296996 10.210.16.12 -> 10.210.196.134 NFS 322 V4 Call OPEN DH: 0x6d8f1235/f1
49 4.681334155 10.210.196.134 -> 10.210.16.12 NFS 470 V4 Reply (Call In 44) OPEN StateID: 0x900a
50 4.683485544 10.210.16.12 -> 10.210.196.134 NFS 258 V4 Call READ StateID: 0x900a Offset: 0 Len: 5
53 4.888349784 10.210.196.134 -> 10.210.16.12 NFS 166 V4 Reply (Call In 50) READ
54 4.888516437 10.210.16.12 -> 10.210.196.134 NFS 266 V4 Call CLOSE StateID: 0x900a
56 5.093391513 10.210.196.134 -> 10.210.16.12 NFS 230 V4 Reply (Call In 54) CLOSE
</pre>
</div>
</div>
So ONTAP responds with NFS4ERR_STALE_CLIENTID instead of NFS4ERR_EXPIRED.<br>
I think this is handled by fs/nfs/nfs4proc.c: nfs4_do_handle_exception()<br />
Let's confirm it:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;"># cat ./nfs4_do_handle_exception.stp
#!/bin/stap
// /bin/stap -g -d /bin/ls --ldd --all-modules
/*
* module("nfsv4").function("nfs4_do_handle_exception@fs/nfs/nfs4proc.c:383") $server:struct nfs_server* $errorcode:int $exception:struct nfs4_exception* $__func__:char[] const
*/
probe module("nfsv4").function("nfs4_do_handle_exception") {
printf("%s %s proc: %s[%d] ppid: %d args: %s cwd: %s uid: %d\n", ctime(gettimeofday_s()), probefunc(), execname(),
pid(), ppid(), cmdline_str(), fullpath_struct_path(task_cwd_path(task_current())), uid())
print_backtrace()
printf("-------\n")
//print_ubacktrace()
//printf("-------\n")
}
# ./nfs4_do_handle_exception.stp --all-modules -o /var/tmp/nfs4_do_handle_exception.out &
</pre>
</div>
</div>
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;">$ date; strace -e trace=open cat /mnt/.../f1
Fri 6 Dec 11:28:32 GMT 2019
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
open("/mnt/.../f1", O_RDONLY) = 3
test
+++ exited with 0 +++
</pre>
</div>
</div>
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 300em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;"> 24 11:28:32.419449800 10.210.16.12 -> 10.210.196.134 NFS 250 V4 Call ACCESS FH: 0x49b41796, [Check: RD LU MD XT DL]
25 11:28:32.624732095 10.210.196.134 -> 10.210.16.12 NFS 222 V4 Reply (Call In 24) ACCESS, [Access Denied: MD XT DL], [Allowed: RD LU]
27 11:28:32.625066507 10.210.16.12 -> 10.210.196.134 NFS 250 V4 Call ACCESS FH: 0x951a2495, [Check: RD LU MD XT DL]
28 11:28:32.830312285 10.210.196.134 -> 10.210.16.12 NFS 222 V4 Reply (Call In 27) ACCESS, [Allowed: RD LU MD XT DL]
29 11:28:32.830765027 10.210.16.12 -> 10.210.196.134 NFS 250 V4 Call ACCESS FH: 0x6d8f1235, [Check: RD LU MD XT DL]
30 11:28:33.046220915 10.210.196.134 -> 10.210.16.12 NFS 222 V4 Reply (Call In 29) ACCESS, [Access Denied: MD XT DL], [Allowed: RD LU]
31 11:28:33.046554768 10.210.16.12 -> 10.210.196.134 NFS 322 V4 Call OPEN DH: 0x6d8f1235/f1
32 11:28:33.251846761 10.210.196.134 -> 10.210.16.12 NFS 150 V4 Reply (Call In 31) OPEN Status: NFS4ERR_STALE_CLIENTID
33 11:28:33.252467673 10.210.16.12 -> 10.210.196.134 NFS 230 V4 Call RENEW CID: 0x3e59
34 11:28:33.457711654 10.210.196.134 -> 10.210.16.12 NFS 182 V4 Reply (Call In 33) RENEW Status: NFS4ERR_STALE_CLIENTID
35 11:28:33.457873262 10.210.16.12 -> 10.210.196.134 NFS 338 V4 Call SETCLIENTID
36 11:28:33.663124308 10.210.196.134 -> 10.210.16.12 NFS 198 V4 Reply (Call In 35) SETCLIENTID
37 11:28:33.663262733 10.210.16.12 -> 10.210.196.134 NFS 238 V4 Call SETCLIENTID_CONFIRM
38 11:28:33.868584648 10.210.196.134 -> 10.210.16.12 NFS 182 V4 Reply (Call In 37) SETCLIENTID_CONFIRM
41 11:28:33.868821676 10.210.16.12 -> 10.210.196.134 NFS 322 V4 Call OPEN DH: 0x6d8f1235/f1
46 11:28:34.074306487 10.210.196.134 -> 10.210.16.12 NFS 470 V4 Reply (Call In 41) OPEN StateID: 0x985c
47 11:28:34.074508377 10.210.16.12 -> 10.210.196.134 NFS 258 V4 Call READ StateID: 0x985c Offset: 0 Len: 5
50 11:28:34.279740081 10.210.196.134 -> 10.210.16.12 NFS 166 V4 Reply (Call In 47) READ
51 11:28:34.279909666 10.210.16.12 -> 10.210.196.134 NFS 266 V4 Call CLOSE StateID: 0x985c
53 11:28:34.485207915 10.210.196.134 -> 10.210.16.12 NFS 230 V4 Reply (Call In 51) CLOSE
</pre>
</div>
</div>
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 300em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;"># less /var/tmp/nfs4_do_handle_exception.out
...
-------
Fri Dec 6 11:28:32 2019 nfs4_do_handle_exception proc: cat[5231] ppid: 5229 args: cat /mnt/.../f1 cwd: /...w uid: 14100
0xffffffffc05d5a80 : nfs4_do_handle_exception+0x0/0x340 [nfsv4]
0xffffffffc05dcf0c : nfs4_handle_exception+0x1c/0x160 [nfsv4]
0xffffffffc05df973 : nfs4_proc_access+0x83/0xf0 [nfsv4]
0xffffffffc0597adb : nfs_do_access+0x1bb/0x3b0 [nfs]
0xffffffffc0597ebe : nfs_permission+0x1be/0x220 [nfs]
0xffffffff91054471 : __inode_permission+0x71/0xd0 [kernel]
0xffffffff910544e8 : inode_permission+0x18/0x50 [kernel]
0xffffffff9105854e : link_path_walk+0x27e/0x8b0 [kernel]
0xffffffff91059c3e : path_openat+0xae/0x5a0 [kernel]
0xffffffff9105b5cd : do_filp_open+0x4d/0xb0 [kernel]
0xffffffff91047524 : do_sys_open+0x124/0x220 [kernel]
0xffffffff9104763e : sys_open+0x1e/0x20 [kernel]
0xffffffff9158d172 : tracesys+0xa6/0xcc [kernel]
0x7f6b0d228710
-------
Fri Dec 6 11:28:32 2019 nfs4_do_handle_exception proc: cat[5231] ppid: 5229 args: cat /mnt/.../f1 cwd: /... uid: 14100
0xffffffffc05d5a80 : nfs4_do_handle_exception+0x0/0x340 [nfsv4]
0xffffffffc05dcf0c : nfs4_handle_exception+0x1c/0x160 [nfsv4]
0xffffffffc05df973 : nfs4_proc_access+0x83/0xf0 [nfsv4]
0xffffffffc0597adb : nfs_do_access+0x1bb/0x3b0 [nfs]
0xffffffffc0597ebe : nfs_permission+0x1be/0x220 [nfs]
0xffffffff91054471 : __inode_permission+0x71/0xd0 [kernel]
0xffffffff910544e8 : inode_permission+0x18/0x50 [kernel]
0xffffffff9105854e : link_path_walk+0x27e/0x8b0 [kernel]
0xffffffff91059c3e : path_openat+0xae/0x5a0 [kernel]
0xffffffff9105b5cd : do_filp_open+0x4d/0xb0 [kernel]
0xffffffff91047524 : do_sys_open+0x124/0x220 [kernel]
0xffffffff9104763e : sys_open+0x1e/0x20 [kernel]
0xffffffff9158d172 : tracesys+0xa6/0xcc [kernel]
0x7f6b0d228710
-------
Fri Dec 6 11:28:33 2019 nfs4_do_handle_exception proc: cat[5231] ppid: 5229 args: cat /mnt/.../f1 cwd: /... uid: 14100
0xffffffffc05d5a80 : nfs4_do_handle_exception+0x0/0x340 [nfsv4]
0xffffffffc05dcf0c : nfs4_handle_exception+0x1c/0x160 [nfsv4]
0xffffffffc05df973 : nfs4_proc_access+0x83/0xf0 [nfsv4]
0xffffffffc0597adb : nfs_do_access+0x1bb/0x3b0 [nfs]
0xffffffffc0597ebe : nfs_permission+0x1be/0x220 [nfs]
0xffffffff91054471 : __inode_permission+0x71/0xd0 [kernel]
0xffffffff910544e8 : inode_permission+0x18/0x50 [kernel]
0xffffffff9105854e : link_path_walk+0x27e/0x8b0 [kernel]
0xffffffff91059c3e : path_openat+0xae/0x5a0 [kernel]
0xffffffff9105b5cd : do_filp_open+0x4d/0xb0 [kernel]
0xffffffff91047524 : do_sys_open+0x124/0x220 [kernel]
0xffffffff9104763e : sys_open+0x1e/0x20 [kernel]
0xffffffff9158d172 : tracesys+0xa6/0xcc [kernel]
0x7f6b0d228710
-------
Fri Dec 6 11:28:33 2019 nfs4_do_handle_exception proc: cat[5231] ppid: 5229 args: cat /mnt/.../f1 cwd: /... uid: 14100
0xffffffffc05d5a80 : nfs4_do_handle_exception+0x0/0x340 [nfsv4]
0xffffffffc05dcf0c : nfs4_handle_exception+0x1c/0x160 [nfsv4]
0xffffffffc05e0296 : nfs4_do_open+0x2c6/0x630 [nfsv4]
0xffffffffc05e06f7 : nfs4_atomic_open+0xf7/0x110 [nfsv4]
0xffffffffc0597317 : nfs_atomic_open+0x197/0x3a0 [nfs]
0xffffffff91057623 : do_last+0xa53/0x1290 [kernel]
0xffffffff91059c5d : path_openat+0xcd/0x5a0 [kernel]
0xffffffff9105b5cd : do_filp_open+0x4d/0xb0 [kernel]
0xffffffff91047524 : do_sys_open+0x124/0x220 [kernel]
0xffffffff9104763e : sys_open+0x1e/0x20 [kernel]
0xffffffff9158d172 : tracesys+0xa6/0xcc [kernel]
0x7f6b0d228710
-------
</pre>
</div>
</div>
The nfs4_do_open() function does end up calling nfs4_do_handle_exception(), the relevant code is:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;">static int nfs4_do_handle_exception(struct nfs_server *server,
int errorcode, struct nfs4_exception *exception)
{
...
case -NFS4ERR_EXPIRED:
case -NFS4ERR_BAD_STATEID:
if (inode != NULL && stateid != NULL) {
nfs_inode_find_state_and_recover(inode,
stateid);
goto wait_on_recovery;
}
...
case -NFS4ERR_STALE_STATEID:
case -NFS4ERR_STALE_CLIENTID:
nfs4_schedule_lease_recovery(clp);
goto wait_on_recovery;
...
</pre>
</div>
</div>
In another function they handle both of them the same though:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;">@@ -5960,6 +5962,8 @@ static void nfs4_release_lockowner_done(struct rpc_task *task, void *calldata)
break;
case -NFS4ERR_STALE_CLIENTID:
case -NFS4ERR_EXPIRED:
+ nfs4_schedule_lease_recovery(server->nfs_client);
+ break;
</pre>
</div>
</div>
Not sure why it is handled differently in nfs4_do_handle_exception().<br />
This was changed by:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;">commit 272289a3df7297fd798a8ddf652c6d29f675de28
Author: Trond Myklebust
Date: Thu Sep 22 13:39:15 2016 -0400
NFSv4: nfs4_do_handle_exception() handle revoke/expiry of a single stateid
If we're not yet sure that all state has expired or been revoked, we
should try to do a minimal recovery on just the one stateid.
Signed-off-by: Trond Myklebust
Tested-by: Oleg Drokin
Signed-off-by: Anna Schumaker
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 389089e..409b9fa 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -397,13 +397,23 @@ static int nfs4_do_handle_exception(struct nfs_server *server,
exception->delay = 0;
exception->recovering = 0;
exception->retry = 0;
+
+ if (stateid == NULL && state != NULL)
+ stateid = &state->stateid;</pre>
</div>
</div>
<br>
The new nfs v4 client behavior in Linux was introduced by <a href="https://github.com/torvalds/linux/commit/83ca7f5ab31f57506bb35482e4b0426be653196a">https://github.com/torvalds/linux/commit/83ca7f5ab31f57506bb35482e4b0426be653196a</a>.
This is the first time the linux client assumes implicit lease renewal when it calls fsinfio() (getattrs for mounts). The code was later refactored a little bit but latest upstream version still does it (at the time of debugging the issue).<br>
<br>
<br>
The below two patches I contributed to Linux kernel are now also present in CentOS 8.3 with kernel 4.18.0-240.1.1.el8_3 and newer.
<a href="https://github.com/torvalds/linux/commit/924491f2e476f7234d722b24171a4daff61bbe13">https://github.com/torvalds/linux/commit/924491f2e476f7234d722b24171a4daff61bbe13</a><br>
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;">
NFSv4: try lease recovery on NFS4ERR_EXPIRED
Currently, if an nfs server returns NFS4ERR_EXPIRED to open(),
we return EIO to applications without even trying to recover.
Fixes: 272289a ("NFSv4: nfs4_do_handle_exception() handle revoke/expiry of a single stateid")
Signed-off-by: Robert Milkowski
Reviewed-by: Trond Myklebust
Signed-off-by: Anna Schumaker
</pre></div></div>
<a href="https://github.com/torvalds/linux/commit/7dc2993a9e51dd2eee955944efec65bef90265b7">https://github.com/torvalds/linux/commit/7dc2993a9e51dd2eee955944efec65bef90265b7</a><br>
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;">
NFSv4.0: nfs4_do_fsinfo() should not do implicit lease renewals
Currently, each time nfs4_do_fsinfo() is called it will do an implicit
NFS4 lease renewal, which is not compliant with the NFS4 specification.
This can result in a lease being expired by an NFS server.
Commit 83ca7f5 ("NFS: Avoid PUTROOTFH when managing leases")
introduced implicit client lease renewal in nfs4_do_fsinfo(),
which can result in the NFSv4.0 lease to expire on a server side,
and servers returning NFS4ERR_EXPIRED or NFS4ERR_STALE_CLIENTID.
This can easily be reproduced by frequently unmounting a sub-mount,
then stat'ing it to get it mounted again, which will delay or even
completely prevent client from sending RENEW operations if no other
NFS operations are issued. Eventually nfs server will expire client's
lease and return an error on file access or next RENEW.
This can also happen when a sub-mount is automatically unmounted
due to inactivity (after nfs_mountpoint_expiry_timeout), then it is
mounted again via stat(). This can result in a short window during
which client's lease will expire on a server but not on a client.
This specific case was observed on production systems.
This patch removes the implicit lease renewal from nfs4_do_fsinfo().
Fixes: 83ca7f5 ("NFS: Avoid PUTROOTFH when managing leases")
Signed-off-by: Robert Milkowski
Signed-off-by: Anna Schumaker
</pre></div></div>
Robert Milkowskihttp://www.blogger.com/profile/13188276898947228975noreply@blogger.com2tag:blogger.com,1999:blog-9963485.post-33159730099596570502022-12-28T13:25:00.004+01:002022-12-28T13:25:40.026+01:00NFSv4 + Kerberos: 4 minutes of slowness<div>It’s a blog entry I started writing over 3 years ago and never managed to actually finish/publish it, until now.</div><div>I haven’t checked in a while if latest Linux kernel is still affected by the described behaviour here, but I suspect it is.
</div><div><br /></div><div>An application running on CentOS 6.9 - it often copies files (4-6GB in size on average) from a local SSD based cache to NFSv4 (+krb) mount. Developers complained that twice a day the copy takes significantly longer to complete without any obvious reason. They even added logging the their code to log all occurrences when a copy of a 64MB chunk takes longer than 10,000ms.</div>
<br />After some quick testing I determined that reads are fast, but writes sometimes would be very slow, despite them being buffered writes on a server with lots of free memory. No obvious bottlenecks on network, no packet drops, etc.<br />
<br />
Initially I suspected write back throttling, but quickly realised this was not the culprit.<br />
<br />
After tracing the java code I determined that the application ends up calling sendfile() with 64MB chunks. Nothing unusual here, but just in case there is some sendfile() specific behaviour here, I tried to reproduce the issue with read()/write() loop with 64MB chunks and hit the same issue. The issue also occurred with smaller chunk sizes.<br />
<br />A test version of the app (with only the relevant/affected file copy code):<br />
<br />
<pre style="-webkit-text-stroke-width: 0px; background-color: #f4f5f7; color: #172b4d; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; margin: 0px; max-height: 30em; orphans: 2; overflow-wrap: normal; overflow: auto; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: pre; widows: 2; word-break: normal; word-spacing: 0px;">$ strace -tT -v -f -o /tmp/a java CopyTest /.../f1 /mnt/.../tmp/f1
Copying a chunk of 67076096 bytes from position 0 in /…/f1 to /…/tmp/f1 took 28005 ms (2.3 MB/s)
Switching copy chunk size to 33554432 bytes (32 MB)
Copying a chunk of 33554432 bytes from position 67076096 in /…/f1 to /…/tmp/f1 took 14161 ms (2.3 MB/s)
Switching copy chunk size to 16777216 bytes (16 MB)
Copying a chunk of 16777216 bytes from position 100630528 in /…/f1 to /…/tmp/f1 took 7555 ms (2.1 MB/s)
Copying a chunk of 16777216 bytes from position 117407744 in /…/f1 to /…/tmp/f1 took 7111 ms (2.3 MB/s)
Copying a chunk of 16777216 bytes from position 134184960 in /…/f1 to /…/tmp/f1 took 6595 ms (2.4 MB/s)
^C</pre>
<b></b><i></i><u></u><sub></sub><sup></sup><strike></strike><br />
The app ends up calling sendfile() in a loop with 64MB block (or smaller if it detects it is slow), which should be ok. However each sendfile() takes quite a long time to finish (when the issue was happening):
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background-color: #f4f5f7; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;">15212 15:58:25 open("/…/f1", O_RDONLY) = 4 <0.000038>
...
15212 15:58:25 open("/…/tmp/f1", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 6 <0.003453>
...
15212 15:58:25 sendfile(6, 4, [0], 67076096 <unfinished ...>
15212 15:58:53 <... sendfile resumed> ) = 67076096 <28.004107>
...
15212 15:58:53 sendfile(6, 4, [67076096], 33554432 <unfinished ...>
15212 15:59:07 <... sendfile resumed> ) = 33554432 <14.160558>
...
15212 15:59:07 sendfile(6, 4, [100630528], 16777216 <unfinished ...>
15212 15:59:14 <... sendfile resumed> ) = 16777216 <7.555443>
...
15212 15:59:14 sendfile(6, 4, [117407744], 16777216 <unfinished ...>
15212 15:59:22 <... sendfile resumed> ) = 16777216 <7.110757>
...
</pre>
</div>
</div>
This is less than 3MB/s.
The iostat output while the above test was running:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background-color: #f4f5f7; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;">Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Filesystem: rBlk_nor/s wBlk_nor/s rBlk_dir/s wBlk_dir/s rBlk_svr/s wBlk_svr/s ops/s rops/s wops/s
hostxxx:/mount/point 0.00 3968.00 0.00 0.00 0.00 3936.00 509.00 0.00 492.00
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Filesystem: rBlk_nor/s wBlk_nor/s rBlk_dir/s wBlk_dir/s rBlk_svr/s wBlk_svr/s ops/s rops/s wops/s
hostxxx:/mount/point 0.00 3968.00 0.00 0.00 0.00 4040.00 581.00 0.00 505.00
</pre>
</div>
</div>
There are no reads from local disks as the file is cached in memory.
<br />
No network congestion, no increased load on the nfs servers during the slowdowns, etc.<br />
The nfsiostat output (while writes are slow):
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background-color: #f4f5f7; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;"># nfsiostat /mnt/…/ 1
hostxxx:/mount/point mounted on /mnt/…/:
op/s rpc bklog
251.20 0.00
read: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms)
38.022 2419.958 63.647 320 (0.0%) 5.031 7.105
write: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms)
106.765 5446.704 51.016 40 (0.0%) 3.165 4799.198
hostxxx:/mount/point mounted on /mnt/…/:
op/s rpc bklog
476.00 0.00
read: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms)
0.000 0.000 0.000 0 (0.0%) 0.000 0.000
write: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms)
476.000 2076.922 4.363 0 (0.0%) 2.025 2.092
<span style="-webkit-text-stroke-width: 0px; background-color: #f4f5f7; color: #172b4d; display: inline; float: none; font-family: monospace; font-size: 12px; font-style: normal; font-variant-alternates: normal; font-variant-caps: normal; font-variant-east-asian: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-position: normal; font-variant: normal; font-weight: 400; letter-spacing: normal; orphans: 2; overflow-wrap: normal; text-align: left; text-decoration: none; text-indent: 0px; text-transform: none; white-space: pre; word-break: normal; word-spacing: 0px;">hostxxx:/mount/point</span> mounted on /mnt/…/:
op/s rpc bklog
494.00 0.00
read: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms)
0.000 0.000 0.000 0 (0.0%) 0.000 0.000
write: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms)
494.000 2155.461 4.363 0 (0.0%) 1.996 2.026
<span style="-webkit-text-stroke-width: 0px; background-color: #f4f5f7; color: #172b4d; display: inline; float: none; font-family: monospace; font-size: 12px; font-style: normal; font-variant-alternates: normal; font-variant-caps: normal; font-variant-east-asian: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-position: normal; font-variant: normal; font-weight: 400; letter-spacing: normal; orphans: 2; overflow-wrap: normal; text-align: left; text-decoration: none; text-indent: 0px; text-transform: none; white-space: pre; word-break: normal; word-spacing: 0px;">hostxxx:/mount/point</span> mounted on /mnt/…/:
op/s rpc bklog
367.00 0.00
read: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms)
0.000 0.000 0.000 0 (0.0%) 0.000 0.000
write: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms)
322.000 1404.977 4.363 0 (0.0%) 3.099 3.130
<span style="-webkit-text-stroke-width: 0px; background-color: #f4f5f7; color: #172b4d; display: inline; float: none; font-family: monospace; font-size: 12px; font-style: normal; font-variant-alternates: normal; font-variant-caps: normal; font-variant-east-asian: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-position: normal; font-variant: normal; font-weight: 400; letter-spacing: normal; orphans: 2; overflow-wrap: normal; text-align: left; text-decoration: none; text-indent: 0px; text-transform: none; white-space: pre; word-break: normal; word-spacing: 0px;">hostxxx:/mount/point</span> mounted on /mnt/…/:
op/s rpc bklog
453.00 0.00
read: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms)
0.000 0.000 0.000 0 (0.0%) 0.000 0.000
write: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms)
453.000 1976.566 4.363 0 (0.0%) 2.179 2.212
<span style="-webkit-text-stroke-width: 0px; background-color: #f4f5f7; color: #172b4d; display: inline; float: none; font-family: monospace; font-size: 12px; font-style: normal; font-variant-alternates: normal; font-variant-caps: normal; font-variant-east-asian: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-position: normal; font-variant: normal; font-weight: 400; letter-spacing: normal; orphans: 2; overflow-wrap: normal; text-align: left; text-decoration: none; text-indent: 0px; text-transform: none; white-space: pre; word-break: normal; word-spacing: 0px;">hostxxx:/mount/point</span> mounted on /mnt/…/:
op/s rpc bklog
437.00 0.00
read: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms)
0.000 0.000 0.000 0 (0.0%) 0.000 0.000
write: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms)
437.000 1906.754 4.363 0 (0.0%) 2.256 2.288
^C
</pre>
</div>
</div>
Let's compare to a fast run:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background-color: #f4f5f7; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;"><span style="-webkit-text-stroke-width: 0px; background-color: #f4f5f7; color: #172b4d; display: inline; float: none; font-family: monospace; font-size: 12px; font-style: normal; font-variant-alternates: normal; font-variant-caps: normal; font-variant-east-asian: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-position: normal; font-variant: normal; font-weight: 400; letter-spacing: normal; orphans: 2; overflow-wrap: normal; text-align: left; text-decoration: none; text-indent: 0px; text-transform: none; white-space: pre; word-break: normal; word-spacing: 0px;">hostxxx:/mount/point</span> mounted on /mnt/…/:
op/s rpc bklog
5178.00 33515.07
read: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms)
0.000 0.000 0.000 0 (0.0%) 0.000 0.000
write: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms)
5032.000 323876.031 64.363 0 (0.0%) 2.908 13164.891
<span style="-webkit-text-stroke-width: 0px; background-color: #f4f5f7; color: #172b4d; display: inline; float: none; font-family: monospace; font-size: 12px; font-style: normal; font-variant-alternates: normal; font-variant-caps: normal; font-variant-east-asian: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-position: normal; font-variant: normal; font-weight: 400; letter-spacing: normal; orphans: 2; overflow-wrap: normal; text-align: left; text-decoration: none; text-indent: 0px; text-transform: none; white-space: pre; word-break: normal; word-spacing: 0px;">hostxxx:/mount/point</span> mounted on /mnt/…/:
op/s rpc bklog
4830.00 28661.95
read: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms)
0.000 0.000 0.000 0 (0.0%) 0.000 0.000
write: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms)
4672.000 300705.250 64.363 0 (0.0%) 3.191 14171.388
<span style="-webkit-text-stroke-width: 0px; background-color: #f4f5f7; color: #172b4d; display: inline; float: none; font-family: monospace; font-size: 12px; font-style: normal; font-variant-alternates: normal; font-variant-caps: normal; font-variant-east-asian: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-position: normal; font-variant: normal; font-weight: 400; letter-spacing: normal; orphans: 2; overflow-wrap: normal; text-align: left; text-decoration: none; text-indent: 0px; text-transform: none; white-space: pre; word-break: normal; word-spacing: 0px;">hostxxx:/mount/point</span> mounted on /mnt/.../:
op/s rpc bklog
5030.00 23886.62
read: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms)
0.000 0.000 0.000 0 (0.0%) 0.000 0.000
write: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms)
4875.000 313770.996 64.363 0 (0.0%) 3.028 15096.526
</pre></div></div>
The RTT is about the same in slow and fast runs. The avg exe is completely different, but this seems to be due to large rpc backlog in fast case while no queuing in slow case. Also avg block size is just 4KB in slow case vs. 64KB in fast case.
I managed to reproduce the issue with cp command which uses read()/write() syscalls.<br />
Fast/good case:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background-color: #f4f5f7; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;">64477 17:31:46 read(3, "..."..., 65536) = 65536 <0.000028>
64477 17:31:46 write(4, "..."..., 65536) = 65536 <0.000048>
64477 17:31:46 read(3, "..."..., 65536) = 65536 <0.000028>
64477 17:31:46 write(4, "..."..., 65536) = 65536 <0.000058>
64477 17:31:46 read(3, "..."..., 65536) = 65536 <0.000028>
64477 17:31:46 write(4, "..."..., 65536) = 65536 <0.000059>
64477 17:31:46 read(3, "..."..., 65536) = 65536 <0.000029>
64477 17:31:46 write(4, "..."..., 65536) = 65536 <0.000052>
</pre>
</div>
</div>
slow/bad case:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background-color: #f4f5f7; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;">62376 17:27:02 read(3, "..."..., 65536) = 65536 <0.000060>
62376 17:27:02 write(4, "..."..., 65536) = 65536 <0.021412>
62376 17:27:02 read(3, "..."..., 65536) = 65536 <0.000046>
62376 17:27:02 write(4, "..."..., 65536) = 65536 <0.022443>
62376 17:27:02 read(3, "..."..., 65536) = 65536 <0.000050>
62376 17:27:02 write(4, "..."..., 65536) = 65536 <0.032568>
62376 17:27:02 read(3, "..."..., 65536) = 65536 <0.000054>
62376 17:27:02 write(4, "..."..., 65536) = 65536 <0.038464><span face="-apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Oxygen, Ubuntu, "Fira Sans", "Droid Sans", "Helvetica Neue", sans-serif" style="font-size: 14px; white-space: normal;"> </span></pre></div></div>
Similar numbers to sendfile() case.
While the issue is happening, let' s collect kernel stacks:
<div class="code panel" style="-webkit-text-stroke-width: 0px; background-color: #f4f5f7; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="codeContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre class="code-none" style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre-wrap;"># perf script -i perf.data.s3
Failed to open /tmp/perf-7054.map, continuing without symbols
java 7055 [040] 43655512.790932: writeback:wbc_writepage: bdi 0:484: towrt=9223372036854775807 skip=0 mode=1 kupd=0 bgrd=0 reclm=0 cyclic=0 more=0 older=0x0 start=0x0 end=0x7fffffffffffffff
ffffffff811c9046 ftrace_profile_wbc_writepage ([kernel.kallsyms])
ffffffff811441da write_cache_pages ([kernel.kallsyms])
199d2 nfs_writepages ([nfs])
ffffffff81144291 do_writepages ([kernel.kallsyms])
ffffffff811c7a7d writeback_single_inode ([kernel.kallsyms])
ffffffff811c7c63 sync_inode ([kernel.kallsyms])
18983 nfs_wb_all ([nfs])
8a8c nfs_write_end ([nfs])
ffffffff8112e210 pagecache_write_end ([kernel.kallsyms])
ffffffff811cb0a6 pipe_to_file ([kernel.kallsyms])
ffffffff811c9ce2 splice_from_pipe_feed ([kernel.kallsyms])
ffffffff811cadbe generic_file_splice_write_actor ([kernel.kallsyms])
ffffffff811cae9c splice_write_to_file ([kernel.kallsyms])
ffffffff811caf45 generic_file_splice_write ([kernel.kallsyms])
79c6 nfs_file_splice_write ([nfs])
ffffffff811ca96a do_splice_from ([kernel.kallsyms])
ffffffff811ca9c0 direct_splice_actor ([kernel.kallsyms])
ffffffff811cabf6 splice_direct_to_actor ([kernel.kallsyms])
ffffffff811cad3d do_splice_direct ([kernel.kallsyms])
ffffffff8119abc4 do_sendfile ([kernel.kallsyms])
ffffffff8119aca1 sys_sendfile64 ([kernel.kallsyms])
ffffffff81556951 tracesys ([kernel.kallsyms])
df81a sendfile (/lib64/libc-2.12.so)
7fb4bcbd07a7 [unknown] (/tmp/perf-7054.map)
7fb4bcbbfe50 [unknown] (/tmp/perf-7054.map)
7fb4bcbbfe50 [unknown] (/tmp/perf-7054.map)
7fb4bcbbfe50 [unknown] (/tmp/perf-7054.map)
7fb4bcbbfe50 [unknown] (/tmp/perf-7054.map)
7fb4bcbb87a7 [unknown] (/tmp/perf-7054.map)
697a76 JavaCalls::call_helper (/.../pkgs/jdk1.8.0_181/jre/lib/amd64/server/libjvm.so)
6d91c2 jni_invoke_static (/.../pkgs/jdk1.8.0_181/jre/lib/amd64/server/libjvm.so)
6f5a2a jni_CallStaticVoidMethod (/.../pkgs/jdk1.8.0_181/jre/lib/amd64/server/libjvm.so)
80ff JavaMain (/.../pkgs/jdk1.8.0_181/lib/amd64/jli/libjli.so)
7aa1 start_thread (/lib64/libpthread-2.12.so)
</pre>
</div>
</div>
What's interesting here is:
<div class="code panel" style="-webkit-text-stroke-width: 0px; background-color: #f4f5f7; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="codeContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre class="code-none" style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre-wrap;">...
ffffffff811c7c63 sync_inode ([kernel.kallsyms])
18983 nfs_wb_all ([nfs])
8a8c nfs_write_end ([nfs])
...
</pre>
</div>
</div>
This is the corresponding source code in kernel:
<div class="code panel" style="-webkit-text-stroke-width: 0px; background-color: #f4f5f7; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="codeContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre class="code-none" style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre-wrap;">./fs/nfs/file.c:
417 static int nfs_write_end(struct file *file, struct address_space *mapping,
…
458 if (nfs_ctx_key_to_expire(ctx, mapping->host)) {
459 status = nfs_wb_all(mapping->host);
460 if (status < 0)
461 return status;
462 }
</pre>
</div>
</div>
In order to call nfs_wb_all() which is flushing all pages and making app issuing synchronous 4kb i/o (the performance issue) the nfs_ctx_key_to_expire() has to return non zero value. Let's check the code when it happens:
<div class="code panel" style="-webkit-text-stroke-width: 0px; background-color: #f4f5f7; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="codeContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre class="code-none" style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre-wrap;">./fs/nfs/file.c:
417 static int nfs_write_end(struct file *file, struct address_space *mapping,
…
458 if (nfs_ctx_key_to_expire(ctx, mapping->host)) {
459 status = nfs_wb_all(mapping->host);
460 if (status < 0)
461 return status;
462 }
./fs/nfs/write.c:
bool nfs_ctx_key_to_expire(struct nfs_open_context *ctx, struct inode *inode)
{
struct rpc_auth *auth = NFS_SERVER(inode)->client->cl_auth;
return rpcauth_cred_key_to_expire(auth, ctx->cred);
}
./net/sunrpc/auth.c:
bool
rpcauth_cred_key_to_expire(struct rpc_auth *auth, struct rpc_cred *cred)
{
if (auth->au_flags & RPCAUTH_AUTH_NO_CRKEY_TIMEOUT)
return false;
if (!cred->cr_ops->crkey_to_expire)
return false;
return cred->cr_ops->crkey_to_expire(cred);
}
./net/sunrpc/auth_generic.c: .crkey_to_expire = generic_key_to_expire,
./net/sunrpc/auth_generic.c:
static bool generic_key_to_expire(struct rpc_cred *cred)
{
struct auth_cred *acred = &container_of(cred, struct generic_cred,
gc_base)->acred;
bool ret;
get_rpccred(cred);
ret = test_bit(RPC_CRED_KEY_EXPIRE_SOON, &acred->ac_flags);
put_rpccred(cred);
return ret;
}
static const struct rpc_credops generic_credops = {
.cr_name = "Generic cred",
.crdestroy = generic_destroy_cred,
.crbind = generic_bind_cred,
.crmatch = generic_match,
.crkey_to_expire = generic_key_to_expire,
};
./net/sunrpc/auth_generic.c
202 generic_key_timeout(struct rpc_auth *auth, struct rpc_cred *cred)
224 ret = tcred->cr_ops->crkey_timeout(tcred);
225 if (ret != 0) {
226 set_bit(RPC_CRED_KEY_EXPIRE_SOON, &acred->ac_flags);
227 ret = 0;
228 } else {
229 /* In case underlying cred key has been reset */
230 if (test_and_clear_bit(RPC_CRED_KEY_EXPIRE_SOON,
231 &acred->ac_flags))
232 dprless intk("RPC: UID %d Credential key reset\n",
233 tcred->cr_uid);
234 /* set up fasttrack for the normal case */
235 set_bit(RPC_CRED_NOTIFY_TIMEOUT, &acred->ac_flags);
236 }
./net/sunrpc/auth_gss/auth_gss.c: .crkey_timeout = gss_key_timeout,
./net/sunrpc/auth_gss/auth_gss.c:
./net/sunrpc/auth_gss/auth_gss.c:
#define GSS_KEY_EXPIRE_TIMEO 240
static unsigned int gss_key_expire_timeo = GSS_KEY_EXPIRE_TIMEO;
static int
gss_key_timeout(struct rpc_cred *rc)
{
struct gss_cred *gss_cred = container_of(rc, struct gss_cred, gc_base);
unsigned long now = jiffies;
unsigned long expire;
if (gss_cred->gc_ctx == NULL)
return -EACCES;
expire = gss_cred->gc_ctx->gc_expiry - (gss_key_expire_timeo * HZ);
if (time_after(now, expire))
return -EACCES;
return 0;
}
</pre>
</div>
</div>
To hit the code path the RPC_CRED_KEY_EXPIRE_SOON flag has to be set.
Let's use kprobes to trace the relevant kernel functions and see what they return:
<div class="code panel" style="-webkit-text-stroke-width: 0px; background-color: #f4f5f7; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="codeContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre class="code-none" style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre-wrap;"># ./kprobe -d 1 'r:nfs_ctx_key_to_expire $retval'
Tracing kprobe nfs_ctx_key_to_expire for 1 seconds (buffered)...
<...>-80656 [005] 21002.939170: nfs_ctx_key_to_expire: (nfs_write_end+0x18f/0x300 [nfs] <- nfs_ctx_key_to_expire) arg1=1
<...>-80656 [005] 21002.940730: nfs_ctx_key_to_expire: (nfs_write_end+0x18f/0x300 [nfs] <- nfs_ctx_key_to_expire) arg1=1
<...>-80656 [005] 21002.942151: nfs_ctx_key_to_expire: (nfs_write_end+0x18f/0x300 [nfs] <- nfs_ctx_key_to_expire) arg1=1
<...>-80656 [005] 21002.943556: nfs_ctx_key_to_expire: (nfs_write_end+0x18f/0x300 [nfs] <- nfs_ctx_key_to_expire) arg1=1
<...>-80656 [005] 21002.944987: nfs_ctx_key_to_expire: (nfs_write_end+0x18f/0x300 [nfs] <- nfs_ctx_key_to_expire) arg1=1
<...>-80656 [005] 21002.946327: nfs_ctx_key_to_expire: (nfs_write_end+0x18f/0x300 [nfs] <- nfs_ctx_key_to_expire) arg1=1
<...>-80656 [005] 21002.947668: nfs_ctx_key_to_expire: (nfs_write_end+0x18f/0x300 [nfs] <- nfs_ctx_key_to_expire) arg1=1
<...>-80656 [005] 21002.949157: nfs_ctx_key_to_expire: (nfs_file_splice_write+0xfd/0x140 [nfs] <- nfs_ctx_key_to_expire) arg1=1
<...>-80656 [005] 21002.949171: nfs_ctx_key_to_expire: (nfs_write_end+0x18f/0x300 [nfs] <- nfs_ctx_key_to_expire) arg1=1
<...>-80656 [005] 21002.950754: nfs_ctx_key_to_expire: (nfs_write_end+0x18f/0x300 [nfs] <- nfs_ctx_key_to_expire) arg1=1
<...>-80656 [005] 21002.952075: nfs_ctx_key_to_expire: (nfs_write_end+0x18f/0x300 [nfs] <- nfs_ctx_key_to_expire) arg1=1
<...>-80656 [005] 21002.953633: nfs_ctx_key_to_expire: (nfs_write_end+0x18f/0x300 [nfs] <- nfs_ctx_key_to_expire) arg1=1
…
# ./kprobe -d 1 'r:rpcauth_cred_key_to_expire $retval'
<...>-80656 [005] 21511.495978: rpcauth_cred_key_to_expire: (nfs_ctx_key_to_expire+0x2b/0x30 [nfs] <- rpcauth_cred_key_to_expire) arg1=1
<...>-80656 [005] 21511.497206: rpcauth_cred_key_to_expire: (nfs_ctx_key_to_expire+0x2b/0x30 [nfs] <- rpcauth_cred_key_to_expire) arg1=1
<...>-80656 [005] 21511.498429: rpcauth_cred_key_to_expire: (nfs_ctx_key_to_expire+0x2b/0x30 [nfs] <- rpcauth_cred_key_to_expire) arg1=1
<...>-80656 [005] 21511.498443: rpcauth_cred_key_to_expire: (nfs_ctx_key_to_expire+0x2b/0x30 [nfs] <- rpcauth_cred_key_to_expire) arg1=1
…
# ./kprobe -d 1 'r:generic_key_to_expire $retval'
<...>-80656 [005] 21663.712048: generic_key_to_expire: (rpcauth_cred_key_to_expire+0x21/0x30 [sunrpc] <- generic_key_to_expire) arg1=1
<...>-80656 [005] 21663.713661: generic_key_to_expire: (rpcauth_cred_key_to_expire+0x21/0x30 [sunrpc] <- generic_key_to_expire) arg1=1
<...>-80656 [005] 21663.715152: generic_key_to_expire: (rpcauth_cred_key_to_expire+0x21/0x30 [sunrpc] <- generic_key_to_expire) arg1=1
<...>-80656 [005] 21663.716583: generic_key_to_expire: (rpcauth_cred_key_to_expire+0x21/0x30 [sunrpc] <- generic_key_to_expire) arg1=1
…
</pre>
</div>
</div>
The generic_key_to_expire() function returns 1 while the issue is happening. All it does is it tests if RPC_CRED_KEY_EXPIRE_SOON bit is set:
<div class="code panel" style="-webkit-text-stroke-width: 0px; background-color: #f4f5f7; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="codeContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre class="code-none" style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre-wrap;">static bool generic_key_to_expire(struct rpc_cred *cred)
{
struct auth_cred *acred = &container_of(cred, struct generic_cred,
gc_base)->acred;
bool ret;
get_rpccred(cred);
ret = test_bit(RPC_CRED_KEY_EXPIRE_SOON, &acred->ac_flags);
put_rpccred(cred);
return ret;
}</pre></div></div>
Let's get kernel stack for the cp case:
<div class="code panel" style="-webkit-text-stroke-width: 0px; background-color: #f4f5f7; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="codeContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre class="code-none" style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre-wrap;">cp 34297 [001] 25235.714806: writeback:wbc_writepage: bdi 0:1479: towrt=9223372036854775807 skip=0 mode=1 kupd=0 bgrd=0 reclm=0 cyclic=0 more=0 older=0x0 start=0x0 end=0x7fffffffffffffff
7fff811ca046 ftrace_profile_wbc_writepage ([kernel.kallsyms])
7fff811451da write_cache_pages ([kernel.kallsyms])
7fffa041b962 nfs_writepages ([kernel.kallsyms])
7fff81145291 do_writepages ([kernel.kallsyms])
7fff811c8a7d writeback_single_inode ([kernel.kallsyms])
7fff811c8c63 sync_inode ([kernel.kallsyms])
7fffa041a913 nfs_wb_all ([kernel.kallsyms])
7fffa040aa1c nfs_write_end ([kernel.kallsyms])
7fff8113087a generic_file_buffered_write ([kernel.kallsyms])
7fff81132270 __generic_file_aio_write ([kernel.kallsyms])
7fff81132528 generic_file_aio_write ([kernel.kallsyms])
7fffa040a3f4 nfs_file_write ([kernel.kallsyms])
7fff8119b47a do_sync_write ([kernel.kallsyms])
7fff8119b778 vfs_write ([kernel.kallsyms])
7fff8119c2b1 sys_write ([kernel.kallsyms])
7fff81557951 tracesys ([kernel.kallsyms])
db7d0 __write_nocancel (/lib64/libc-2.12.so)
</pre>
</div>
</div>
This is interesting, as vfs_write() is calling do_sync_write(), the corresponding source code is:
<div class="code panel" style="-webkit-text-stroke-width: 0px; background-color: #f4f5f7; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="codeContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre class="code-none" style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre-wrap;">ssize_t vfs_write(struct file *file, const char __user *buf, size_t count, loff_t *pos)
…
if (file->f_op->write)
ret = file->f_op->write(file, buf, count, pos);
else
ret = do_sync_write(file, buf, count, pos);</pre></div></div>
In more recent kernels the code around the RPC_CRED_KEY_EXPIRE_SOON changed, for example:
<div class="code panel" style="-webkit-text-stroke-width: 0px; background-color: #f4f5f7; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="codeContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre class="code-none" style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre-wrap;">commit ddf529eeed59184c49dcad1633c11831f822bf6b
Author: NeilBrown <neilb@suse.com>
Date: Mon Dec 3 11:30:30 2018 +1100
NFS: move credential expiry tracking out of SUNRPC into NFS.
NFS needs to know when a credential is about to expire so that
it can modify write-back behaviour to finish the write inside the
expiry time.
It currently uses functions in SUNRPC code which make use of a
fairly complex callback scheme and flags in the generic credientials.
As I am working to discard the generic credentials, this has to change.
This patch moves the logic into NFS, in part by finding and caching
the low-level credential in the open_context. We then make direct
cred-api calls on that.
This makes the code much simpler and removes a dependency on generic
rpc credentials.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
</pre>
</div>
</div>
There are other changes as well, and the issue might be fixed in more recent kernels.
However I reproduced the issue on Linux kernel 5.5.0-rc2 - latest at the time of this debugging.<br>
I wrote stap script which instruments nfs_ctx_key_to_expire() function and prints some basic information including if RPC_CRED_KEY_EXPIRE_SOON bit is set. It also instruments gss_key_timeout() function and prints current time, gc_expire and delta.
<div class="code panel" style="-webkit-text-stroke-width: 0px; background-color: #f4f5f7; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="codeContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre class="code-none" style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre-wrap;"># cat nfs_ctx_key_to_expire.stp
#!/usr/bin/env stap
global RPC_CRED_KEY_EXPIRE_SOON = 2; // bit 1
global RPC_CRED_NOTIFY_TIMEOUT = 4; // bit 2
global IN;
global AC_FLAGS;
@define AC_FLAGS_TO_STR(ac_flags) %(
" " .
(@ac_flags & RPC_CRED_KEY_EXPIRE_SOON ? "RPC_CRED_KEY_EXPIRE_SOON" : "") .
(@ac_flags & RPC_CRED_NOTIFY_TIMEOUT ? "RPC_CRED_NOTIFY_TIMEOUT" : "") .
" "
%)
probe module("nfs").function("nfs_ctx_key_to_expire") {
IN[tid()] = 1;
}
probe module("nfs").function("nfs_ctx_key_to_expire").return {
// probefunc() returns outside function in .return and pp() is too verbose, so let's hard-code string
printf("%s %s()=%d proc: %s[%d] uid: %d ", ctime(gettimeofday_s()), "nfs_ctx_key_to_expire", $return, execname(), pid(), uid());
printf("ac_flags: %d [%s] ", AC_FLAGS[tid()], @AC_FLAGS_TO_STR(AC_FLAGS[tid()]));
if ($# > 0 && @1 == "inode_name") {
printf("inode_name: %s", inode_name(@entry($inode)));
}
printf("\n");
delete AC_FLAGS[tid()];
delete IN[tid()];
}
probe module("sunrpc").function("generic_key_to_expire").return {
if (IN[tid()]) {
acred = & @container_of(@entry($cred), "struct generic_cred", gc_base)->acred;
AC_FLAGS[tid()] = acred->ac_flags;
/*
flags_s = " ";
flags_s = flags_s . (acred->ac_flags & RPC_CRED_KEY_EXPIRE_SOON ? "RPC_CRED_KEY_EXPIRE_SOON" : "")
flags_s = flags_s . (acred->ac_flags & RPC_CRED_NOTIFY_TIMEOUT ? "RPC_CRED_NOTIFY_TIMEOUT" : "")
printf("ac_flags: %d [%s ]\n", acred->ac_flags, flags_s);
*/
}
}
probe module("auth_rpcgss").function("gss_key_timeout").return {
printf("%s %s()=%d proc: %s[%d] uid: %d ", ctime(gettimeofday_s()), "gss_key_timeout", $return, execname(), pid(), uid());
gss_cred = & @container_of(@entry($rc), "struct gss_cred", gc_base);
printf("now: %d ", jiffies()); // @entry($now) is broken
printf("gc_expire: %d %d ", gss_cred->gc_ctx->gc_expiry, gss_cred->gc_ctx->gc_expiry - 240 * HZ());
printf("delta: %d %d", (gss_cred->gc_ctx->gc_expiry - jiffies())/HZ(), (gss_cred->gc_ctx->gc_expiry - 240 * HZ() - jiffies())/HZ());
printf("\n");
print_backtrace();
}
</pre>
</div>
</div>
Then I modified key timeout used by rpc-gssd to 5 minutes for easier/quicker testing:
<div class="code panel" style="-webkit-text-stroke-width: 0px; background-color: #f4f5f7; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="codeContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre class="code-none" style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre-wrap;"># vi /etc/sysconfig/nfs
#RPCGSSDARGS="-t 1800"
RPCGSSDARGS="-t 300"
# systemctl restart rpc-gssd</pre></div></div>
This makes ticket valid for 5 minutes in the kernel.
Now in another terminal I run:
<div class="code panel" style="-webkit-text-stroke-width: 0px; background-color: #f4f5f7; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="codeContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre class="code-none" style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre-wrap;">$ while [ 1 ]; do /bin/echo aa >nethome/test ; rm nethome/test; sleep 5; done
</pre>
</div>
</div>
While also running:
<div class="code panel" style="-webkit-text-stroke-width: 0px; background-color: #f4f5f7; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="codeContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre class="code-none" style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre-wrap;"># ./nfs_ctx_key_to_expire.stp -o /var/tmp/aa xx &
# tail -f /var/tmp/aa
...
Thu Aug 1 14:43:33 2019 nfs_ctx_key_to_expire()=0 proc: echo[31151] uid: 14100 ac_flags: 4 [ RPC_CRED_NOTIFY_TIMEOUT ]
Thu Aug 1 14:43:33 2019 nfs_ctx_key_to_expire()=0 proc: echo[31151] uid: 14100 ac_flags: 4 [ RPC_CRED_NOTIFY_TIMEOUT ]
Thu Aug 1 14:43:38 2019 nfs_ctx_key_to_expire()=0 proc: echo[31158] uid: 14100 ac_flags: 4 [ RPC_CRED_NOTIFY_TIMEOUT ]
Thu Aug 1 14:43:38 2019 nfs_ctx_key_to_expire()=0 proc: echo[31158] uid: 14100 ac_flags: 4 [ RPC_CRED_NOTIFY_TIMEOUT ]
Thu Aug 1 14:43:43 2019 nfs_ctx_key_to_expire()=0 proc: echo[31165] uid: 14100 ac_flags: 4 [ RPC_CRED_NOTIFY_TIMEOUT ]
Thu Aug 1 14:43:43 2019 nfs_ctx_key_to_expire()=0 proc: echo[31165] uid: 14100 ac_flags: 4 [ RPC_CRED_NOTIFY_TIMEOUT ]
Thu Aug 1 14:43:48 2019 gss_key_timeout()=-13 proc: echo[31172] uid: 14100 now: 7509867697 gc_expire: 7510107517 7509867517 delta: 239 0
Returning from: 0xffffffffc03ad000 : gss_key_timeout+0x0/0x50 [auth_rpcgss]
Returning to : 0xffffffffc0348e1e : generic_key_timeout+0x5e/0xd0 [sunrpc]
0xffffffffc0347051 : rpcauth_key_timeout_notify+0x21/0x30 [sunrpc]
0xffffffffc04fe9bf : nfs_key_timeout_notify+0x2f/0x40 [nfs]
0xffffffffc04ed939 : nfs_file_write+0x59/0x1e0 [nfs]
0xffffffff8cc1e763
0xffffffff8cc1f240
0xffffffff8cc2006f
0xffffffff8d12579b
Thu Aug 1 14:43:48 2019 nfs_ctx_key_to_expire()=1 proc: echo[31172] uid: 14100 ac_flags: 2 [ RPC_CRED_KEY_EXPIRE_SOON ]
Thu Aug 1 14:43:48 2019 nfs_ctx_key_to_expire()=1 proc: echo[31172] uid: 14100 ac_flags: 2 [ RPC_CRED_KEY_EXPIRE_SOON ]
Thu Aug 1 14:43:53 2019 gss_key_timeout()=-13 proc: echo[31179] uid: 14100 now: 7509872712 gc_expire: 7510107517 7509867517 delta: 234 -5
Returning from: 0xffffffffc03ad000 : gss_key_timeout+0x0/0x50 [auth_rpcgss]
Returning to : 0xffffffffc0348e1e : generic_key_timeout+0x5e/0xd0 [sunrpc]
0xffffffffc0347051 : rpcauth_key_timeout_notify+0x21/0x30 [sunrpc]
0xffffffffc04fe9bf : nfs_key_timeout_notify+0x2f/0x40 [nfs]
0xffffffffc04ed939 : nfs_file_write+0x59/0x1e0 [nfs]
0xffffffff8cc1e763
0xffffffff8cc1f240
0xffffffff8cc2006f
0xffffffff8d12579b
...
Thu Aug 1 14:47:34 2019 nfs_ctx_key_to_expire()=1 proc: echo[32009] uid: 14100 ac_flags: 2 [ RPC_CRED_KEY_EXPIRE_SOON ]
Thu Aug 1 14:47:34 2019 nfs_ctx_key_to_expire()=1 proc: echo[32009] uid: 14100 ac_flags: 2 [ RPC_CRED_KEY_EXPIRE_SOON ]
Thu Aug 1 14:47:39 2019 gss_key_timeout()=-13 proc: echo[32018] uid: 14100 now: 7510098709 gc_expire: 7510107517 7509867517 delta: 8 -231
Returning from: 0xffffffffc03ad000 : gss_key_timeout+0x0/0x50 [auth_rpcgss]
Returning to : 0xffffffffc0348e1e : generic_key_timeout+0x5e/0xd0 [sunrpc]
0xffffffffc0347051 : rpcauth_key_timeout_notify+0x21/0x30 [sunrpc]
0xffffffffc04fe9bf : nfs_key_timeout_notify+0x2f/0x40 [nfs]
0xffffffffc04ed939 : nfs_file_write+0x59/0x1e0 [nfs]
0xffffffff8cc1e763
0xffffffff8cc1f240
0xffffffff8cc2006f
0xffffffff8d125a1b
Thu Aug 1 14:47:39 2019 nfs_ctx_key_to_expire()=1 proc: echo[32018] uid: 14100 ac_flags: 2 [ RPC_CRED_KEY_EXPIRE_SOON ]
Thu Aug 1 14:47:39 2019 nfs_ctx_key_to_expire()=1 proc: echo[32018] uid: 14100 ac_flags: 2 [ RPC_CRED_KEY_EXPIRE_SOON ]
Thu Aug 1 14:47:44 2019 gss_key_timeout()=-13 proc: echo[32027] uid: 14100 now: 7510103729 gc_expire: 7510107517 7509867517 delta: 3 -236
Returning from: 0xffffffffc03ad000 : gss_key_timeout+0x0/0x50 [auth_rpcgss]
Returning to : 0xffffffffc0348e1e : generic_key_timeout+0x5e/0xd0 [sunrpc]
0xffffffffc0347051 : rpcauth_key_timeout_notify+0x21/0x30 [sunrpc]
0xffffffffc04fe9bf : nfs_key_timeout_notify+0x2f/0x40 [nfs]
0xffffffffc04ed939 : nfs_file_write+0x59/0x1e0 [nfs]
0xffffffff8cc1e763
0xffffffff8cc1f240
0xffffffff8cc2006f
0xffffffff8d125a1b
Thu Aug 1 14:47:44 2019 nfs_ctx_key_to_expire()=1 proc: echo[32027] uid: 14100 ac_flags: 2 [ RPC_CRED_KEY_EXPIRE_SOON ]
Thu Aug 1 14:47:44 2019 nfs_ctx_key_to_expire()=1 proc: echo[32027] uid: 14100 ac_flags: 2 [ RPC_CRED_KEY_EXPIRE_SOON ]
Thu Aug 1 14:47:49 2019 gss_key_timeout()=0 proc: echo[32037] uid: 14100 now: 7510108758 gc_expire: 7510408751 7510168751 delta: 299 59
Returning from: 0xffffffffc03ad000 : gss_key_timeout+0x0/0x50 [auth_rpcgss]
Returning to : 0xffffffffc0348e1e : generic_key_timeout+0x5e/0xd0 [sunrpc]
0xffffffffc0347051 : rpcauth_key_timeout_notify+0x21/0x30 [sunrpc]
0xffffffffc04fe9bf : nfs_key_timeout_notify+0x2f/0x40 [nfs]
0xffffffffc04ed939 : nfs_file_write+0x59/0x1e0 [nfs]
0xffffffff8cc1e763
0xffffffff8cc1f240
0xffffffff8cc2006f
0xffffffff8d125a1b
Thu Aug 1 14:47:49 2019 nfs_ctx_key_to_expire()=0 proc: echo[32037] uid: 14100 ac_flags: 4 [ RPC_CRED_NOTIFY_TIMEOUT ]
Thu Aug 1 14:47:49 2019 nfs_ctx_key_to_expire()=0 proc: echo[32037] uid: 14100 ac_flags: 4 [ RPC_CRED_NOTIFY_TIMEOUT ]
Thu Aug 1 14:47:54 2019 nfs_ctx_key_to_expire()=0 proc: echo[32046] uid: 14100 ac_flags: 4 [ RPC_CRED_NOTIFY_TIMEOUT ]
Thu Aug 1 14:47:54 2019 nfs_ctx_key_to_expire()=0 proc: echo[32046] uid: 14100 ac_flags: 4 [ RPC_CRED_NOTIFY_TIMEOUT ]
Thu Aug 1 14:48:00 2019 nfs_ctx_key_to_expire()=0 proc: echo[32069] uid: 14100 ac_flags: 4 [ RPC_CRED_NOTIFY_TIMEOUT ]
Thu Aug 1 14:48:00 2019 nfs_ctx_key_to_expire()=0 proc: echo[32069] uid: 14100 ac_flags: 4 [ RPC_CRED_NOTIFY_TIMEOUT ]
</pre>
</div>
</div>
But once we get to 5minutes - 240s window the RPC_CRED_KEY_EXPIRE_SOON bit is set which makes all writes synchronous (and slow). Once it expires, it gets renewed and then it is fine for another 1 minute (5m-4m). On some servers we had “-t 1800”, which resulted in 4 minutes of slowness every 30 minutes.<br>
Just to double check I also measured time it takes to complete each write:
<div class="code panel" style="-webkit-text-stroke-width: 0px; background-color: #f4f5f7; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="codeContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre class="code-none" style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre-wrap;">$ while [ 1 ]; do strace -tT -v -etrace=write /bin/echo aa >nethome/test ; rm nethome/test; sleep 5; done
15:47:19 write(1, "aa\n", 3) = 3 <0.001536>
15:47:19 +++ exited with 0 +++
15:47:24 write(1, "aa\n", 3) = 3 <0.001501>
15:47:24 +++ exited with 0 +++
15:47:29 write(1, "aa\n", 3) = 3 <0.001427>
15:47:29 +++ exited with 0 +++
15:47:34 write(1, "aa\n", 3) = 3 <0.001455>
15:47:34 +++ exited with 0 +++
15:47:39 write(1, "aa\n", 3) = 3 <0.001503>
15:47:39 +++ exited with 0 +++
15:47:44 write(1, "aa\n", 3) = 3 <0.001542>
15:47:44 +++ exited with 0 +++
15:47:49 write(1, "aa\n", 3) = 3 <0.000132>
15:47:49 +++ exited with 0 +++
15:47:54 write(1, "aa\n", 3) = 3 <0.000069>
15:47:54 +++ exited with 0 +++
15:48:00 write(1, "aa\n", 3) = 3 <0.000338>
15:48:00 +++ exited with 0 +++
^C
</pre>
</div>
</div>
Notice that initially each write() is slow (over 1ms) and then it becomes fast which matches to when the bit is cleared:
<div class="code panel" style="-webkit-text-stroke-width: 0px; background-color: #f4f5f7; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="codeContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre class="code-none" style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre-wrap;"># grep 'Thu Aug 1 14:47' /var/tmp/aa
Thu Aug 1 14:47:01 2019 gss_key_timeout()=-13 proc: echo[31942] uid: 14100 now: 7510060000 gc_expire: 7510107517 7509867517 delta: 47 -192
Thu Aug 1 14:47:01 2019 nfs_ctx_key_to_expire()=1 proc: echo[31942] uid: 14100 ac_flags: 2 [ RPC_CRED_KEY_EXPIRE_SOON ]
Thu Aug 1 14:47:01 2019 nfs_ctx_key_to_expire()=1 proc: echo[31942] uid: 14100 ac_flags: 2 [ RPC_CRED_KEY_EXPIRE_SOON ]
Thu Aug 1 14:47:06 2019 gss_key_timeout()=-13 proc: echo[31951] uid: 14100 now: 7510065018 gc_expire: 7510107517 7509867517 delta: 42 -197
Thu Aug 1 14:47:06 2019 nfs_ctx_key_to_expire()=1 proc: echo[31951] uid: 14100 ac_flags: 2 [ RPC_CRED_KEY_EXPIRE_SOON ]
Thu Aug 1 14:47:06 2019 nfs_ctx_key_to_expire()=1 proc: echo[31951] uid: 14100 ac_flags: 2 [ RPC_CRED_KEY_EXPIRE_SOON ]
Thu Aug 1 14:47:11 2019 gss_key_timeout()=-13 proc: echo[31960] uid: 14100 now: 7510070037 gc_expire: 7510107517 7509867517 delta: 37 -202
Thu Aug 1 14:47:11 2019 nfs_ctx_key_to_expire()=1 proc: echo[31960] uid: 14100 ac_flags: 2 [ RPC_CRED_KEY_EXPIRE_SOON ]
Thu Aug 1 14:47:11 2019 nfs_ctx_key_to_expire()=1 proc: echo[31960] uid: 14100 ac_flags: 2 [ RPC_CRED_KEY_EXPIRE_SOON ]
Thu Aug 1 14:47:16 2019 gss_key_timeout()=-13 proc: echo[31972] uid: 14100 now: 7510075056 gc_expire: 7510107517 7509867517 delta: 32 -207
Thu Aug 1 14:47:16 2019 nfs_ctx_key_to_expire()=1 proc: echo[31972] uid: 14100 ac_flags: 2 [ RPC_CRED_KEY_EXPIRE_SOON ]
Thu Aug 1 14:47:16 2019 nfs_ctx_key_to_expire()=1 proc: echo[31972] uid: 14100 ac_flags: 2 [ RPC_CRED_KEY_EXPIRE_SOON ]
Thu Aug 1 14:47:19 2019 gss_key_timeout()=-13 proc: echo[31981] uid: 14100 now: 7510078637 gc_expire: 7510107517 7509867517 delta: 28 -211
Thu Aug 1 14:47:19 2019 nfs_ctx_key_to_expire()=1 proc: echo[31981] uid: 14100 ac_flags: 2 [ RPC_CRED_KEY_EXPIRE_SOON ]
Thu Aug 1 14:47:19 2019 nfs_ctx_key_to_expire()=1 proc: echo[31981] uid: 14100 ac_flags: 2 [ RPC_CRED_KEY_EXPIRE_SOON ]
Thu Aug 1 14:47:24 2019 gss_key_timeout()=-13 proc: echo[31991] uid: 14100 now: 7510083655 gc_expire: 7510107517 7509867517 delta: 23 -216
Thu Aug 1 14:47:24 2019 nfs_ctx_key_to_expire()=1 proc: echo[31991] uid: 14100 ac_flags: 2 [ RPC_CRED_KEY_EXPIRE_SOON ]
Thu Aug 1 14:47:24 2019 nfs_ctx_key_to_expire()=1 proc: echo[31991] uid: 14100 ac_flags: 2 [ RPC_CRED_KEY_EXPIRE_SOON ]
Thu Aug 1 14:47:29 2019 gss_key_timeout()=-13 proc: echo[32000] uid: 14100 now: 7510088673 gc_expire: 7510107517 7509867517 delta: 18 -221
Thu Aug 1 14:47:29 2019 nfs_ctx_key_to_expire()=1 proc: echo[32000] uid: 14100 ac_flags: 2 [ RPC_CRED_KEY_EXPIRE_SOON ]
Thu Aug 1 14:47:29 2019 nfs_ctx_key_to_expire()=1 proc: echo[32000] uid: 14100 ac_flags: 2 [ RPC_CRED_KEY_EXPIRE_SOON ]
Thu Aug 1 14:47:34 2019 gss_key_timeout()=-13 proc: echo[32009] uid: 14100 now: 7510093690 gc_expire: 7510107517 7509867517 delta: 13 -226
Thu Aug 1 14:47:34 2019 nfs_ctx_key_to_expire()=1 proc: echo[32009] uid: 14100 ac_flags: 2 [ RPC_CRED_KEY_EXPIRE_SOON ]
Thu Aug 1 14:47:34 2019 nfs_ctx_key_to_expire()=1 proc: echo[32009] uid: 14100 ac_flags: 2 [ RPC_CRED_KEY_EXPIRE_SOON ]
Thu Aug 1 14:47:39 2019 gss_key_timeout()=-13 proc: echo[32018] uid: 14100 now: 7510098709 gc_expire: 7510107517 7509867517 delta: 8 -231
Thu Aug 1 14:47:39 2019 nfs_ctx_key_to_expire()=1 proc: echo[32018] uid: 14100 ac_flags: 2 [ RPC_CRED_KEY_EXPIRE_SOON ]
Thu Aug 1 14:47:39 2019 nfs_ctx_key_to_expire()=1 proc: echo[32018] uid: 14100 ac_flags: 2 [ RPC_CRED_KEY_EXPIRE_SOON ]
Thu Aug 1 14:47:44 2019 gss_key_timeout()=-13 proc: echo[32027] uid: 14100 now: 7510103729 gc_expire: 7510107517 7509867517 delta: 3 -236
Thu Aug 1 14:47:44 2019 nfs_ctx_key_to_expire()=1 proc: echo[32027] uid: 14100 ac_flags: 2 [ RPC_CRED_KEY_EXPIRE_SOON ]
Thu Aug 1 14:47:44 2019 nfs_ctx_key_to_expire()=1 proc: echo[32027] uid: 14100 ac_flags: 2 [ RPC_CRED_KEY_EXPIRE_SOON ]
Thu Aug 1 14:47:49 2019 gss_key_timeout()=0 proc: echo[32037] uid: 14100 now: 7510108758 gc_expire: 7510408751 7510168751 delta: 299 59
Thu Aug 1 14:47:49 2019 nfs_ctx_key_to_expire()=0 proc: echo[32037] uid: 14100 ac_flags: 4 [ RPC_CRED_NOTIFY_TIMEOUT ]
Thu Aug 1 14:47:49 2019 nfs_ctx_key_to_expire()=0 proc: echo[32037] uid: 14100 ac_flags: 4 [ RPC_CRED_NOTIFY_TIMEOUT ]
Thu Aug 1 14:47:54 2019 nfs_ctx_key_to_expire()=0 proc: echo[32046] uid: 14100 ac_flags: 4 [ RPC_CRED_NOTIFY_TIMEOUT ]
Thu Aug 1 14:47:54 2019 nfs_ctx_key_to_expire()=0 proc: echo[32046] uid: 14100 ac_flags: 4 [ RPC_CRED_NOTIFY_TIMEOUT ]</pre></div></div>
I couldn’t find anything else in the source code which would end up calling the above functions or call_refresh() – like an extra thread which periodically wakes up and refreshes all keys before they end up with the RPC_CRED_KEY_EXPIRE_SOON set. Looks like the current behaviour is, that once a cred/key gets into 240s period before it is set to expire, all writes are no longer buffereed and are being written wynchronously on a page-by-page (4KB) basis, additionally all dirt pages are being flushed. Once the cred expires it gets refreshed and if successful all will switch back to normal. Until the next 240s window before expiry.<br>
<br>
I tried a quick “fix” which works ok on 5.5.0-rc2:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background-color: #f4f5f7; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;">diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c
index d75fddc..6d36ae6 100644
--- a/net/sunrpc/auth_gss/auth_gss.c
+++ b/net/sunrpc/auth_gss/auth_gss.c
@@ -1478,6 +1478,8 @@ static void gss_pipe_free(struct gss_pipe *p)
ctx = rcu_dereference(gss_cred->gc_ctx);
if (!ctx || time_after(timeout, ctx->gc_expiry))
ret = -EACCES;
+ if (ctx && time_after(timeout + (60 * HZ), ctx->gc_expiry))
+ clear_bit(RPCAUTH_CRED_UPTODATE, &rc->cr_flags);
rcu_read_unlock();
return ret;
</pre>
</div>
</div>
This is NOT a final solution and is wrong in a general case.<br>
<br>
I also wrote a systemtap script to modify a kernel on-the fly which essentially does the same as the patch in previous update (again, this is not a correct solution and is potentially risky).
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background-color: #f4f5f7; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;"># cat nfs_ctx_key_to_expire_workaround3.stp
#!/usr/bin/env stap
global RPC_CRED_KEY_EXPIRE_SOON = 2; // bit 1
global RPC_CRED_NOTIFY_TIMEOUT = 4; // bit 2
global IN;
global AC_FLAGS;
global LOG = 0;
@define AC_FLAGS_TO_STR(ac_flags) %(
" " .
(@ac_flags & RPC_CRED_KEY_EXPIRE_SOON ? "RPC_CRED_KEY_EXPIRE_SOON" : "") .
(@ac_flags & RPC_CRED_NOTIFY_TIMEOUT ? "RPC_CRED_NOTIFY_TIMEOUT" : "") .
" "
%)
probe begin {
if ($# > 0 && @1 == "log")
LOG = 1;
}
probe module("nfs").function("nfs_ctx_key_to_expire") {
IN[tid()] = 1;
}
probe module("nfs").function("nfs_ctx_key_to_expire").return {
if ($return == 0) {
delete IN[tid()];
delete AC_FLAGS[tid()];
next;
}
if (LOG) {
// probefunc() returns outside function in .return and pp() is too verbose, so let's hard-code string
printf("%s %s()=%d proc: %s[%d] uid: %d ", ctime(gettimeofday_s()), "nfs_ctx_key_to_expire", $return, execname(), pid(), uid());
printf("ac_flags: %d [%s]\n", AC_FLAGS[tid()], @AC_FLAGS_TO_STR(AC_FLAGS[tid()]));
}
//print_backtrace();
//printf("\n");
%( guru_mode == 1 %?
if (LOG)
printf("%s %s PATCHING return code to 0\n", ctime(gettimeofday_s()), "nfs_ctx_key_to_expire");
$return = 0;
%:
printf("%s %s NON-GURU MODE, would have patched rc=0\n", ctime(gettimeofday_s()), "nfs_ctx_key_to_expire");
%)
delete AC_FLAGS[tid()]; [0/1861]
delete IN[tid()];
}
probe module("sunrpc").function("generic_key_to_expire").return {
if (IN[tid()]) {
acred = & @container_of(@entry($cred), "struct generic_cred", gc_base)->acred;
AC_FLAGS[tid()] = acred->ac_flags;
/*
flags_s = " ";
flags_s = flags_s . (acred->ac_flags & RPC_CRED_KEY_EXPIRE_SOON ? "RPC_CRED_KEY_EXPIRE_SOON" : "")
flags_s = flags_s . (acred->ac_flags & RPC_CRED_NOTIFY_TIMEOUT ? "RPC_CRED_NOTIFY_TIMEOUT" : "")
printf("ac_flags: %d [%s ]\n", acred->ac_flags, flags_s);
*/
}
}
probe module("auth_rpcgss").function("gss_key_timeout") {
gss_cred = & @container_of($rc, "struct gss_cred", gc_base);
if (gss_cred == 0 || gss_cred->gc_ctx == 0) {
if (LOG)
printf("%s %s NOT PATCHING %s[%d], cr_uid: %d, cr_flags: %d, gss_cred: %x, gss_cred->gc_ctx: %x\n",
ctime(gettimeofday_s()),
"gss_key_timeout",
execname(), pid(),
$rc->cr_uid,
$rc->cr_flags,
gss_cred,
gss_cred ? gss_cred->gc_ctx : 0);
next;
}
dt = (gss_cred->gc_ctx->gc_expiry - jiffies())/HZ();
%( guru_mode == 1 %?
// define RPCAUTH_CRED_UPTODATE 1
// this is a bit number, so it is 2 in decimal
if (dt < 300 && $rc->cr_flags & 2) {
// new_flags = $rc->cr_flags - 2;
new_flags = $rc->cr_flags & ~(1 << 1);
if (LOG)
printf("%s %s patching %s[%d], cr_uid: %d, dt: %d cr_flags: %d new cr_flags: %d\n",
ctime(gettimeofday_s()),
"gss_key_timeout",
execname(), pid(),
$rc->cr_uid,
dt,
$rc->cr_flags,
new_flags);
//$rc->cr_flags = new_flags;
$rc->cr_flags = $rc->cr_flags & ~(1 << 1);
//gss_cred->gc_ctx->gc_expiry = 360 * HZ();
}
%:
// just print some info in non-guru mode
if (!(dt < 300 && $rc->cr_flags & 2))
next;
printf("%s %s[%d], cr_uid: %d dt: %d cr_flags: %d\n",
ctime(gettimeofday_s()),
execname(), pid(),
$rc->cr_uid,
dt,
$rc->cr_flags);
%)
}
</pre>
</div>
</div>
The script contains some left-overs from initial debugging and some of it is not necessary for the workaround to work.<br>
To run it:
<div class="preformatted panel" style="-webkit-text-stroke-width: 0px; background-color: #f4f5f7; background: rgb(244, 245, 247); border-radius: 3px; border: 1px solid rgb(193, 199, 208); color: #172b4d; font-family: monospace; font-size: 12px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.33333; margin: 9px 0px; orphans: 2; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<div class="preformattedContent panelContent" style="margin: 0px; padding: 9px 12px;">
<pre style="margin: 0px; max-height: 30em; overflow-wrap: normal; overflow: auto; padding: 0px; white-space: pre; word-break: normal;"># ./nfs_ctx_key_to_expire_workaround3.stp -g --suppress-time-limits nolog &</pre>
</div>
</div>
<br>
Notice, that running the workaround means any dirty pages are not destaged in the last 240s before key expiry and all writes are still buffered - if for any reason the nfs client won't manage to refresh the key then all the dirty pages might never destage.<br><br>
The current behaviour is rather an lazy approach to fix the issue - I believe a much better solution would be to try to refresh they key a minut earlier - if it is unsuccesful then keep using the previous key and fallback to current behaviour, otherwise reset the timer and keep using new key. This would mean that in a vast majority of cases the 4 minutes of slowness would never happen.
<br><br>
Unfortunately I haven’t got time to work on a proper fix (and for different reasons the issue became less noticeable for us, so it ended up being a low priority).<br>
<br>
btw: the kprobe script was obtained from <a href="https://github.com/brendangregg/perf-tools">https://github.com/brendangregg/perf-tools</a>
Robert Milkowskihttp://www.blogger.com/profile/13188276898947228975noreply@blogger.com2tag:blogger.com,1999:blog-9963485.post-28414832556225676642022-03-11T03:45:00.000+01:002022-03-11T03:45:27.064+01:00TrueNAS Scale and ZFS Write Throttling<div style="text-align: justify;">By default on OpenZFS/Linux the max amount of dirty data is capped at 4GB. This means that if your nas server is connected over network with faster throughput than your pool can sustain (writes), you will quickly end up being throttled when writing lots of data (>>4GB). This can easily happen if your home nas server is connected over 10GbE and its internal pool configuarion can sustain much less (e.g. 2 mirrored disks will sustain less than 200MB/s of writes) and you are writing from time to time large amounts of data.</div><div style="text-align: justify;"><br /></div><div><div style="text-align: justify;">In such a case you might potentially improve performance significantly by tuning ZFS write throttling.</div>
<div style="text-align: justify;"><br /></div><div style="text-align: justify;">Example workflow is - you are editing large'ish video files on your workstation (also connected to 10GbE) on your loval NVMEs drives. Once you are done you want to make a copy to your nas server - the files lets say are 50GB in total.
You will not be able to write at full network speed for the whole transfer due to write throttling.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">If your nas server has plenty of RAM, you could increase the dirty data threshold - if you can increase it to a larger value then the max amount of data you will be writing in one session, then the end result should be that you will sustain a very high throughput over network and from your perspective the whole transfer will be significantly quicker while the nas server will be destaging the data to disks in background.</div><div style="text-align: justify;"><br /></div>
Let's say you want to increase it to 100GB, to do so:
<pre> # echo 107374182400 >/sys/module/zfs/parameters/zfs_dirty_data_max</pre>
There are pros and cons obviously, depending on specifc situation.
For example if you have multiple clients writing to your NAS server then increasing the threshold might not be such a good idea. Generally, if you only have one client doing any heave writes from time to time you should benefit from this tuning.<br />
<br />
There is an excellent blog on <a href="https://www.delphix.com/blog/delphix-engineering/tuning-openzfs-write-throttle" target="_blank">tuning zfs write throttle</a> if you want to understand it in more detail.
<br />
<br />
How do you make the setting persitent in TrueNAS Scale in a supported way?<br />
Go to System Settings -> Advanced -> Init/Shutdown Scripts -> Add</div><div>and populate it with:<br />
<pre> Type: Command
Command: echo 107374182400 >/sys/module/zfs/parameters/zfs_dirty_data_max
When: Post Init
</pre>
After reboot it will pick up the new value.<br />
The advantage of setting it up this way is that it will be included in exported TrueNAS config file.<br />
</div>Robert Milkowskihttp://www.blogger.com/profile/13188276898947228975noreply@blogger.com0tag:blogger.com,1999:blog-9963485.post-27550256827576888132022-02-09T13:50:00.000+01:002022-02-09T13:50:27.964+01:00TrueNAS Scale & ZFS Wrapping Key<p> I've been playing with TrueNAS scale recently and while the BUI allows you to export/download ZFS wrapping key I wanted to know how to get the key manually.
After a quick look at the code I found that the key is stored in sqlite db kept on root file system.
<pre>
root@truenas[~]# sqlite3 /data/freenas-v1.db
SQLite version 3.34.1 2021-01-20 14:10:07
Enter ".help" for usage hints.
sqlite> select * from storage_encrypteddataset;
1|backup|ioI/B72PEllUJjumWpWHkdhDDCd2l2eopFEJgWYIpcAcTT1v0NyYicjzKiHfuoncL2Mklfa45pUJIyxzGFGobr17b1HtprjSth/X9yyfsnROCK/xQL+SVmO/5fT/KabfSSiz8+IfDH8=|
</pre>
The key itself is encrypted so you need to decrypt it first before it can be used with ZFS.
A simple python script to do it attached below.
<pre>
root@truenas[~]# ./decode_key.py
dataset: backup
key: 16f7677b514ef39bc162312274c76da24221ecc5a2f01e6ba0bhfeec054d9162
</pre>
(both the encrypted and decrypted keys above have been modified for this blog entry)
<pre>
root@truenas[~]# cat decode_key.py
#!/usr/bin/python3
# based on /usr/lib/migrate113/freenasUI/system/migrations/0022_cloud_sync.py
import sys
import base64
from Cryptodome.Cipher import AES
import sqlite3
PWENC_BLOCK_SIZE = 32
PWENC_FILE_SECRET = '/data/pwenc_secret'
PWENC_PADDING = b'{'
def pwenc_get_secret():
with open(PWENC_FILE_SECRET, 'rb') as f:
secret = f.read()
return secret
def pwenc_decrypt(encrypted=None):
if not encrypted:
return ""
from Cryptodome.Util import Counter
encrypted = base64.b64decode(encrypted)
nonce = encrypted[:8]
encrypted = encrypted[8:]
cipher = AES.new(
pwenc_get_secret(),
AES.MODE_CTR,
counter=Counter.new(64, prefix=nonce),
)
return cipher.decrypt(encrypted).rstrip(PWENC_PADDING).decode('utf8')
if len(sys.argv) == 2:
print(pwenc_decrypt(sys.argv[1]))
exit(0)
dbcon = sqlite3.connect('/data/freenas-v1.db')
dbcur = dbcon.cursor()
for row in dbcur.execute('select * from storage_encrypteddataset'):
ds_id, ds_name, ds_enc_key, kmip_enc_key = row
#print(ds_id, ds_name, ds_enc_key, pwenc_decrypt(ds_enc_key))
print(f'dataset: {ds_name}\n key: {pwenc_decrypt(ds_enc_key)}\n')
</pre>Robert Milkowskihttp://www.blogger.com/profile/13188276898947228975noreply@blogger.com2tag:blogger.com,1999:blog-9963485.post-74550741446087424372021-05-04T17:05:00.001+02:002021-05-04T17:05:54.543+02:00Linux Tickless and CPU Isolation - microbenchmark<p> How much can be gained when it comes to low latency and jitter by isolating CPUs on Linux?</p><p>Let's do a microbenchmark by using a tool called <a href="https://github.com/rigtorp/hiccups" target="_blank">hiccups</a>.</p><p><br /></p><div>A single CPU server (overclocked).</div>
Kernel booted with the following arguments:
<pre> isolcpus=nohz,domain,managed_irq,15-17 irqaffinity=0-13 nohz_full=15-17 intel_idle.max_cstate=0 idle=poll
nmi_watchdog=0 nowatchdog nosoftlockup intel_pstate=passive
</pre><div><br /></div>
First let's run the hiccups tool on CPU 14 (not isolated, etc.):
<pre> # taskset -c 14 chrt --other 0 ./hiccups -r 60 | column -t -R 1,2,3,4,5,6
cpu threshold_ns hiccups <b>pct99_ns</b> pct999_ns max_ns
14 120 60077 <b>2578 </b>3219 6932
</pre><div><br /></div>
Now let's run it on CPU 15:
<pre> # taskset -c 14 chrt --other 0 ./hiccups -r 60 | column -t -R 1,2,3,4,5,6
cpu threshold_ns hiccups <b>pct99_ns</b> pct999_ns max_ns
15 120 14654 <b>248 </b>1177 7651
</pre><div><br /></div>
This is about 10x difference.<div>It can be optimized even further.</div><div><br /></div>Robert Milkowskihttp://www.blogger.com/profile/13188276898947228975noreply@blogger.com0tag:blogger.com,1999:blog-9963485.post-77240485488292567012021-04-30T03:51:00.014+02:002021-04-30T15:12:31.157+02:00Linux Tickless Kernel & CPU Usage Statistics<p>
I run into an interesting problem yesterday. After changing some kernel boot parameters to enable tickless kernel, an application which is supposed to essentially
"spin" 100% on a CPU was being reported as being mostly idle. This was rather puzzling.</p><p>After some debugging and reading docs I've managed to reproduce the issue with a simple test case.
The test is reproducible on CentOS 8.3 x86_64.
</p><pre>$ uname -r
4.18.0-240.1.1.el8_3.x86_64
$ cat /proc/cmdline
BOOT_IMAGE=(hd0,msdos2)/vmlinuz-4.18.0-240.1.1.el8_3.x86_64 root=/dev/mapper/sysvg-root ro crashkernel=auto
resume=/dev/mapper/sysvg-swap rd.lvm.lv=sysvg/root rd.lvm.lv=sysvg/swap biosdevname=1 ipv6.disable=1
net.ifnames=1 rhgb quiet intel_pstate=passive
intel_idle.max_cstate=0 idle=poll
isolcpus=domain,managed_irq,15-17 irqaffinity=0-13 nohz_full=15
</pre><div><br /></div>
Let's run a single threaded tight loop bound to CPU 16 which is 100% user space (notice the CPU 16 is isolated otherwise):
<pre># time taskset <b>-c 16</b> sh -c "i=1; while [ \$i -lt 10000000 ]; do true; ((i++)); done"
real <b>0m30.713s</b>
user <b>0m30.688s</b>
sys 0m0.001s
</pre>
In another terminal I run mpstat showing CPU 16 usage statistics:
<pre>$ mpstat -P 16 1
...
00:55:14 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
00:55:15 16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
00:55:15 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
00:55:16 16 68.32 0.00 0.00 0.00 0.99 0.00 0.00 0.00 0.00 30.69
00:55:16 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
00:55:17 16 <span style="color: #04ff00;">100.00</span> 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
00:55:17 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
00:55:18 16 <span style="color: #04ff00;">100.00</span> 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
00:55:18 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
00:55:19 16 <span style="color: #04ff00;">100.00</span> 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
00:55:19 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
00:55:20 16 <span style="color: #04ff00;">100.00</span> 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
00:55:20 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
00:55:21 16 <span style="color: #04ff00;">100.00</span> 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
...
00:55:45 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
00:55:46 16 <span style="color: #04ff00;">100.00</span> 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
00:55:46 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
00:55:47 16 2.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 98.00
00:55:47 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
00:55:48 16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
^C
</pre>
As expected CPU 16 was 100% in USR for about 30s.<br />
So far so good.<div>
<br />I then modified kernel boot arguments:</div><div><ul style="text-align: left;"><li>added nohz option to isolcpus</li><li>updated nohz_full variable to include CPUs 15-17 (which includes the CPU 16 we are testing here)</li></ul>
After reboot I repeated the above test.
<pre># cat /proc/cmdline
BOOT_IMAGE=(hd0,msdos2)/vmlinuz-4.18.0-240.1.1.el8_3.x86_64 root=/dev/mapper/sysvg-root ro crashkernel=auto
resume=/dev/mapper/sysvg-swap rd.lvm.lv=sysvg/root rd.lvm.lv=sysvg/swap biosdevname=1 ipv6.disable=1
net.ifnames=1 rhgb quiet intel_pstate=passive
intel_idle.max_cstate=0 idle=poll
<b>isolcpus=nohz</b>,domain,managed_irq,15-17 irqaffinity=0-13 <b>nohz_full=15-17</b>
</pre>
<pre># date;time taskset -c 16 sh -c "i=1; while [ \$i -lt 10000000 ]; do true; ((i++)); done"
Fri 30 Apr 01:11:15 BST 2021
real <b>0m30.849s</b>
user <b>0m30.849s</b>
sys 0m0.000s
</pre>
It took about the same time for it to finish (as expected). However mpstat output is now different:</div><div><div style="height: 400px; overflow-y: scroll; width: 800px;"><pre>$ mpstat -P 16 1
...
01:11:13 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
01:11:14 16 0.00 0.00 0.00 0.00 0.99 0.00 0.00 0.00 0.00 99.01
01:11:14 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
01:11:15 16 23.94 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 76.06
01:11:15 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
01:11:16 16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:11:16 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
01:11:17 16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 <span style="color: red;">100.00</span>
01:11:17 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
01:11:18 16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 <span style="color: red;">100.00</span>
01:11:18 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
01:11:19 16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 <span style="color: red;">100.00</span>
01:11:19 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
01:11:20 16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 <span style="color: red;">100.00</span>
01:11:20 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
01:11:21 16 <span style="color: #04ff00;">100.00</span> 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
01:11:21 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
01:11:22 16 <span style="color: #04ff00;">100.00</span> 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
01:11:22 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
01:11:23 16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 <span style="color: red;">100.00</span>
01:11:23 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
01:11:24 16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 <span style="color: red;">100.00</span>
01:11:24 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
01:11:25 16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 <span style="color: red;">100.00</span>
01:11:25 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
01:11:26 16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 <span style="color: red;">100.00</span>
01:11:26 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
01:11:27 16 <span style="color: #04ff00;">100.00</span> 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
01:11:27 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
01:11:28 16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 <span style="color: red;">100.00</span>
01:11:28 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
01:11:29 16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 <span style="color: red;">100.00</span>
01:11:29 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
01:11:30 16 <span style="color: #04ff00;">100.00</span> 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
01:11:30 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
01:11:31 16 <span style="color: #04ff00;">100.00</span> 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
01:11:31 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
01:11:32 16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 <span style="color: red;">100.00</span>
01:11:32 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
01:11:33 16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 <span style="color: red;">100.00</span>
01:11:33 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
01:11:34 16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 <span style="color: red;">100.00</span>
01:11:34 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
01:11:35 16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 <span style="color: red;">100.00</span>
01:11:35 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
01:11:36 16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 <span style="color: red;">100.00
</span>
01:11:36 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
01:11:37 16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 <span style="color: red;">100.00</span>
01:11:37 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
01:11:38 16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 <span style="color: red;">100.00</span>
01:11:38 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
01:11:39 16 <span style="color: #04ff00;">100.00</span> 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
01:11:39 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
01:11:40 16 <span style="color: #04ff00;">100.00</span> 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
01:11:40 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
01:11:41 16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 <span style="color: red;">100.00</span>
01:11:41 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
01:11:42 16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 <span style="color: red;">100.00</span>
01:11:42 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
01:11:43 16 <span style="color: #04ff00;">100.00</span> 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
01:11:43 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
01:11:44 16 <span style="color: #04ff00;">100.00</span> 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
01:11:44 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
01:11:45 16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 <span style="color: red;">100.00</span>
01:11:45 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
01:11:46 16 76.17 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 23.83
01:11:46 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
01:11:47 16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:11:47 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
01:11:48 16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:11:48 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
01:11:49 16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
01:11:49 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
01:11:50 16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
^C
Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
Average: 16 83.36 0.00 0.00 0.00 0.03 0.00 0.00 0.00 0.00 16.62
</pre>
</div>
<br />Notice that most of the time while the above loop was spinning on the CPU, mpstat reported the CPU as idle.</div><div><br /></div><div>
Given the loop finished in about the same time as before it was not taken off CPU and had to constantly spin as before. Also the real time and usr time are similar as reported by time. This means that mpstat reporting the CPU as being IDLE most of the time can't be true, in fact during the 30s run it had to be spinning 100% in USR.
<br /><br />
There is an excelent <a href="https://www.suse.com/c/cpu-isolation-introduction-part-1/"><b>overview on cpu isolation and tickless kernel</b></a> by SUSE Labs which sheds some light on what and why is happening here.</div><div>Specifically check <a href="https://www.suse.com/c/cpu-isolation-full-dynticks-part2/"><b>the Part 2 -> 3.1 Cputime accounting</b></a> section.
<br /><br /></div><div>
Essentially what seems to be happennig here is that CPU times are only updated when CPU enters or exits user space. This results in a more accurate usage numbers, and more importantly does not require to interrupt
an running application, which is the whole point of tickless kernel.
<br /><br /></div><div>
But what if an application does not issue any system calls and ends up spinning 100% in user space?</div><div>The CPU stats would never get updated... but they actually do every few seconds.</div><div>I'm not entirely sure why - probably there is some mechanism to gather them at some interval or there is an IRQ happening (shouldn't be) or... not that important at the moment.<br /><br />
To further confirm that the cpu usage stats are not being updated due to the application not switching to kernel mode and then back to user mode, let's modify the loop so it now calls some syscalls as well (write()):<pre># date;time taskset -c 16 sh -c "i=1; while [ \$i -lt 10000000 ]; do <b>echo \$i >/dev/null</b>; ((i++)); done"
...
</pre>
<pre>$ mpstat -P 16 1
...
01:47:34 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
01:47:35 16 <span style="color: #04ff00;">81.00</span> 0.00 <span style="color: #04ff00;">19.00</span> 0.00 0.00 0.00 0.00 0.00 0.00 0.00
01:47:35 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
01:47:36 16 <span style="color: #04ff00;">80.20</span> 0.00 <span style="color: #04ff00;">19.80</span> 0.00 0.00 0.00 0.00 0.00 0.00 0.00
01:47:36 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
01:47:37 16 <span style="color: #04ff00;">80.81</span> 0.00 <span style="color: #04ff00;">19.19</span> 0.00 0.00 0.00 0.00 0.00 0.00 0.00
01:47:37 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
01:47:38 16 <span style="color: #04ff00;">80.20</span> 0.00 <span style="color: #04ff00;">19.80</span> 0.00 0.00 0.00 0.00 0.00 0.00 0.00
01:47:38 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
01:47:39 16 <span style="color: #04ff00;">80.81</span> 0.00 <span style="color: #04ff00;">19.19</span> 0.00 0.00 0.00 0.00 0.00 0.00 0.00
01:47:39 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
01:47:40 16 <span style="color: #04ff00;">80.20</span> 0.00 <span style="color: #04ff00;">19.80</span> 0.00 0.00 0.00 0.00 0.00 0.00 0.00
01:47:40 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
01:47:41 16 <span style="color: #04ff00;">80.81</span> 0.00 <span style="color: #04ff00;">19.19</span> 0.00 0.00 0.00 0.00 0.00 0.00 0.00
...
</pre>
As suspected the CPU stats are now constantly updated and mpstat is reporting expected values.</div><div>
<br /><i>Notice that the cpu usage reporting issue is not specific to mpstat and other tools behave in a similar manner as they all get stats from /proc/stat.</i></div><div>
<br />Another interesting observation is, that if you start tracing such an application by using tools like strace or bpftrace (e.g. bpftrace -e 'profile:hz:99/cpu==16/{@=count();}', etc.) this results in the cpu stats to be constantly updated and the issue "miraculously" goes away...</div><div> <br />
Mystery solved.
<br /><p></p></div><div><br /></div><div>btw: in case you wonder how likely it is that an actual real application wouldn't issue any (or very rarely) syscalls - well, with network kernel bypass this definitetly can and is the case with some apps</div><div><br /></div><div><br /></div><div>Update1: if a spinning process is running under SCHED_RR then the cpu accounting issue goes away, but it does happen under SCHED_OTHER and SCHED_FIFO - although less so with the latter.</div>Robert Milkowskihttp://www.blogger.com/profile/13188276898947228975noreply@blogger.com1tag:blogger.com,1999:blog-9963485.post-48667342173682419882020-09-25T11:05:00.001+02:002020-09-25T11:05:17.414+02:00A new GitHub repository for the Oracle Solaris 11.4 OS: oraclesolaris-contrib<p>If you use StatsStore/Analytics on Solaris 11 you should check out <a href="https://github.com/oracle/oraclesolaris-contrib">the new GitHub oraclesolaris-contrib repository</a>. For more details see <a href="https://blogs.oracle.com/solaris/a-new-github-repository-for-the-oracle-solaris-114-os:-oraclesolaris-contrib">Tanmay's blog post</a>.</p><p><br /></p>Robert Milkowskihttp://www.blogger.com/profile/13188276898947228975noreply@blogger.com0tag:blogger.com,1999:blog-9963485.post-5620490053134089342019-12-03T15:44:00.004+01:002019-12-03T15:44:46.657+01:00Checking/modifying file permissions on an underlying directory<br />
If you have a file system (nfs, etc.) mounted on-top of a directory and you need to see file permissions, ACLs, etc. of the underlying directory and not the mounted fs then:<br />
<br />
Solaris:<br />
<br />
<span style="font-family: Courier New, Courier, monospace;"># mount -F lofs -o nosub /some/path/ /mnt/fix</span><br />
<br />
Linux:<br />
<br />
<span style="font-family: Courier New, Courier, monospace;"># mount -B /some/path/ /mnt/fix</span><br />
<br />
If you access /mnt/fix now you won't see any mounted filesystems on top of any directories, just the underlying fs.<br />
<br />
<br />
I used it in the past and forgot about it, needed it yesterday and found it <a href="https://www.linuxquestions.org/questions/solaris-opensolaris-20/why-loopback-filesystem-lofs-153697/">here</a>.Robert Milkowskihttp://www.blogger.com/profile/13188276898947228975noreply@blogger.com0tag:blogger.com,1999:blog-9963485.post-91809587243467684842019-10-21T14:07:00.000+02:002019-10-21T14:07:09.727+02:00ONTAP 9.6+: REST API: netapp_ontap: performance of get_collection() vs. get()NetApp's ONTAP 9.6 introduced <a href="https://blog.netapp.com/restful-apis-now-natively-available-with-netapp-ontap-9-6-software/">new REST API</a> along with a new python module <a href="https://pypi.org/project/netapp-ontap/">netapp_ontap</a>.<br />
<br />
<br />
<span style="font-family: Courier New, Courier, monospace;">#!/bin/python</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">import time</span><br />
<span style="font-family: Courier New, Courier, monospace;">import getpass</span><br />
<span style="font-family: Courier New, Courier, monospace;">import netapp_ontap</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">from netapp_ontap import config</span><br />
<span style="font-family: Courier New, Courier, monospace;">from netapp_ontap.host_connection import HostConnection</span><br />
<span style="font-family: Courier New, Courier, monospace;">from netapp_ontap.resources import QuotaRule</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">password = getpass.getpass()</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">config.CONNECTION = HostConnection(host, user, password)</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">start = time.time()</span><br />
<span style="font-family: Courier New, Courier, monospace;">quota_rules = QuotaRule.get_collection(fields='type')</span><br />
<span style="font-family: Courier New, Courier, monospace;"># quota_rules = QuotaRule.get_collection()</span><br />
<span style="font-family: "Courier New", Courier, monospace;">total = 0</span><br />
<span style="font-family: Courier New, Courier, monospace;">for l in quota_rules:</span><br />
<span style="font-family: Courier New, Courier, monospace;"> # l.get(fields='type')</span><br />
<span style="font-family: Courier New, Courier, monospace;"> total = total + 1</span><br />
<span style="font-family: Courier New, Courier, monospace;">end = time.time()</span><br />
<span style="font-family: Courier New, Courier, monospace;">print(total)</span><br />
<span style="font-family: Courier New, Courier, monospace;">print(end-start)</span><br />
<div>
<br /></div>
<div>
With 471 quota rules on my filer it takes about 4s with get_collection(fields='type') vs. ~40s when calling l.get(fields='type') for each rule being processed. So if you are after all the entries it is more quicker to pass the required fields to get_collection() and not call get() on each returned resource.</div>
<div>
<br /></div>
<div>
I did a little bit of debugging and it seems that each get() results in a new TCP/HTTPS connection being established which is likely the main reason of the much worse performance performance. Also get_collection() gets all 471 results in a single HTTP GET.</div>
<div>
<br /></div>
<div>
There seems to be a bug in regards to re-ussing connections though, as it shouldn't have to establish a new session for each get.</div>
Robert Milkowskihttp://www.blogger.com/profile/13188276898947228975noreply@blogger.com0tag:blogger.com,1999:blog-9963485.post-22011028529691232372019-06-13T12:52:00.002+02:002019-06-13T12:52:58.263+02:00DTrace: nfsv4 provider and utf8stringThe nfsv4 provider provides some structures with component4 type which is <a href="http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/nfs/nfs4_kprot.h#169">defined</a> as:<br />
<br />
<pre>typedef struct {
uint_t utf8string_len;
char *utf8string_val;
} utf8string;
typedef utf8string component4;
</pre>
<br />
So for example, to print NFSv4 file renames you have to do:
<br />
<br />
<pre>nfsv4:::op-rename-start
{
this->a = (char *)alloca(args[2]->oldname.utf8string_len + 1);
bcopy(args[2]->oldname.utf8string_val, this->a, args[2]->oldname.utf8string_len);
this->a[args[2]->oldname.utf8string_len + 1] = '\0';
this->b = (char *)alloca(args[2]->newname.utf8string_len + 1);
bcopy(args[2]->newname.utf8string_val, this->b, args[2]->newname.utf8string_len);
this->b[args[2]->newname.utf8string_len + 1] = '\0';
printf("NFSv4 rename: %s\n", strjoin(this->a, strjoin(" -> ", this->b)));
}
</pre>
<br />
Ideally DTrace (strjoin(), etc.) should deal with utf8string type automatically.Robert Milkowskihttp://www.blogger.com/profile/13188276898947228975noreply@blogger.com0tag:blogger.com,1999:blog-9963485.post-89590019307478187402019-06-13T10:11:00.000+02:002019-06-13T10:11:56.769+02:00Linux Load AveragesLinux measures load average differently than other OS'es. In a nutshell it includes both CPU and disk i/o and more. Brendan has an excellent <a href="http://www.brendangregg.com/blog/2017-08-08/linux-load-averages.html">blog entry on this</a> explaining in much more detail how it works.Robert Milkowskihttp://www.blogger.com/profile/13188276898947228975noreply@blogger.com0tag:blogger.com,1999:blog-9963485.post-40145377502297940202019-05-03T11:45:00.004+02:002019-05-03T11:54:19.832+02:00Testing ZFS/L2ARC Solaris 11.4 - setting zfs_arc_collect_check=0 via mdb (takes immediate effect) or via /etc/system makes ZFS to start feeding l2arc immediately. Notice that this can negatively impact ARC performance so use it with care. This is useful for testing if you want to push some data to into L2ARC quicker/sooner, especially on large memory systems. The variable is checked by arc_can_collect() function (if it returns 1 then l2arc cab be fed, if zero it can't).Robert Milkowskihttp://www.blogger.com/profile/13188276898947228975noreply@blogger.com0tag:blogger.com,1999:blog-9963485.post-60808290211626288682019-03-11T23:18:00.002+01:002019-03-12T01:46:55.636+01:00DTrace stop() actionThe stop() action in DTrace stops an entire process... well, actually it doesn't. It stops a single thread in a multi-threaded process, which got me surprised as I always thought it did stop an entire process. Now, this is actually very useful, though a stopall() action which would stop all threads could now be useful as well :)<br />
<br />
Update: this is getting more complicated now, the way stop() action behaves depends on probe type it is called from. For example, if called from a probe from syscall provider it will just stop a thread which called the syscall, but if called from a probe from PID provider it will stop entire process with all its threads. This is getting confusing...<br />
<br />
btw: pstop PID stops entire process while pstop PID/LWPID stops a single thread<br />
<br />
<br />
<br />Robert Milkowskihttp://www.blogger.com/profile/13188276898947228975noreply@blogger.com0tag:blogger.com,1999:blog-9963485.post-31832609799633406532019-03-01T13:00:00.001+01:002019-03-01T13:00:22.906+01:00DTrace %Y print format with nanosecondsSmall but useful extension to DTrace is now available in Solaris 11.4.SRU6. It allows to easily print current date with an optional nanosecond resolution. It is disabled by default for backward compatibility.<br />
<br />
To enable it you need to add timedecimals option to dtrace:<br />
<br />
<br />
<span style="font-family: "Courier New",Courier,monospace;"># dtrace -q <span style="background-color: yellow;">-x timedecimals</span> -n syscall::open*:entry \</span><br />
<span style="font-family: "Courier New",Courier,monospace;"> '{printf("%Y %s called %s()\n", walltimestamp, execname, probefunc);}'<br />2019 Mar 1 11:50:48<span style="background-color: yellow;">.774114445</span> firefox called openat64()</span><br />
<span style="font-family: "Courier New",Courier,monospace;">2019 Mar 1 11:50:49<span style="background-color: yellow;">.149290513</span> dtrace called openat()<br />2019 Mar 1 11:50:49<span style="background-color: yellow;">.149283375</span> dtrace called openat()<br />2019 Mar 1 11:50:50<span style="background-color: yellow;">.030217373</span> firefox called openat64()<br />2019 Mar 1 11:50:49<span style="background-color: yellow;">.974253263</span> firefox called openat64()</span><br />
<span style="font-family: "Courier New",Courier,monospace;">2019 Mar 1 11:50:50<span style="background-color: yellow;">.114684381</span> VBoxService called openat()<br />^C</span><br />
<br />
You can also specify number of decimal places to be printed, fox example:<br />
<br />
<span style="font-family: "Courier New",Courier,monospace;"># dtrace -q <span style="background-color: yellow;">-x timedecimals=2</span> -n syscall::open*:entry \</span><br />
<span style="font-family: "Courier New",Courier,monospace;"> '{printf("%Y %s called %s()\n", walltimestamp, execname, probefunc);}' <br />2019 Mar 1 11:56:51<span style="background-color: yellow;">.09</span> VBoxService called openat()<br />2019 Mar 1 11:56:51<span style="background-color: yellow;">.09</span> VBoxService called openat()<br />2019 Mar 1 11:56:51<span style="background-color: yellow;">.45</span> dtrace called openat()<br />^C</span><br />
<br />
<br />Robert Milkowskihttp://www.blogger.com/profile/13188276898947228975noreply@blogger.com0tag:blogger.com,1999:blog-9963485.post-73287601036998074632018-11-23T10:57:00.000+01:002018-11-23T10:57:31.667+01:00RAID-Z improvements and cloud device supportSolaris 11.4 introduced few new ZFS pool versions with interesting new features or enhancements:<br />
<br />
# zpool upgrade -v<br />
...<br />
38 Xcopy with encryption<br /> 39 Resilver restart enhancements<br /> 40 New deduplication support<br /> 41 Asynchronous dataset destroy<br /> 42 Reguid: ability to change the pool guid<br /><span style="background-color: yellow;"> 43 RAID-Z improvements and cloud device support</span><br /> 44 Device removal<br />...<br />
<br />
The RAID-Z improvements mean that data is written more efficiently - in some cases it can now store more data in a pool than before. But even more importantly the performance (both throughput and IOPS) of RAIDZ is now close to RAID10!Robert Milkowskihttp://www.blogger.com/profile/13188276898947228975noreply@blogger.com1tag:blogger.com,1999:blog-9963485.post-70667315461403852542018-11-09T17:21:00.002+01:002018-11-09T17:21:54.523+01:00Spectre and Kernel ModulesOn Linux one needs to recompile kernel modules to get protection, while on Solaris this is not necessary. Once you are on Solaris 11.4 with Spectre fixes enabled, all kernel modules, even compiled on older Solaris releases, just work and are protected. Nothing special to do there.Robert Milkowskihttp://www.blogger.com/profile/13188276898947228975noreply@blogger.com1tag:blogger.com,1999:blog-9963485.post-90882892238495609282018-10-12T12:01:00.003+02:002018-10-12T12:01:40.461+02:00bpftraceRight, finally Linux is getting something similar and useful to DTrace, see <a href="http://www.brendangregg.com/blog/2018-10-08/dtrace-for-linux-2018.html">bpftrace</a>. However for it to be useful in enterprise it has to be included in RedHat - I wonder how long it will take though... but maybe around 2020 this will finally happen and then Linux will truly have an equivalent of DTrace, even if 15 years later.<br />
<br />Robert Milkowskihttp://www.blogger.com/profile/13188276898947228975noreply@blogger.com0tag:blogger.com,1999:blog-9963485.post-53147751690018268082018-10-02T14:24:00.000+02:002018-10-02T17:52:59.748+02:00Solaris: Spectre v2 & Meltdown fixes<div style="text-align: justify;">
Solaris 11.4 includes fixes for Meltdown and Spectre v2 (fixes for v1 were delievered few months ago for 11.3 via SRU and are also included in 111.4). What I really like about them is that you can turn them on or off via <u>sxadm</u>. The sxadm command will also report if your HW requires the fixes and if they are enabled or not. Additionally there is an FMA alert generated if you HW should have fixes enabled but due to old microcode it can't be done - so this way you also get alerting. Very nice intergration indeed.<br />
<br /></div>
Example output with Solaris running in Virtual Box:<br />
<br />
<pre># sxadm status
EXTENSION STATUS FLAGS
aslr enabled (tagged-files) u-c--
nxstack enabled (all) u-c--
nxheap enabled (tagged-files) u-c--
<b>kpti enabled -kcr-
ibpb not supported -----
ibrs not supported -----</b>
smap not supported -----
</pre>
<br />
<div style="text-align: justify;">
The <b>kpti </b>is fix for Meltdow and it is active, while <b>ibpb </b>and <b>ibrs </b>are mitigations for Spectre v2 and are not enabled as these are not supported on this HW.
Let's see how FMA is reporting an old version of microcode: </div>
<br />
<pre># fmadm faulty
--------------- ------------------------------------ -------------- ---------
TIME EVENT-ID MSG-ID SEVERITY
--------------- ------------------------------------ -------------- ---------
Oct 02 14:19:24 383538f1-9268-4a07-9ff8-86be48c02e72 SUNOS-8000-LG Major
Problem Status : open
Diag Engine : software-diagnosis / 0.2
System
Manufacturer : unknown
Name : unknown
Part_Number : unknown
Serial_Number : unknown
System Component
Manufacturer : innotek GmbH
Name : VirtualBox
Part_Number :
Serial_Number : 0
Firmware_Manufacturer : innotek GmbH
Firmware_Version : (BIOS)VirtualBox
Firmware_Release : (BIOS)12.01.2006
Host_ID : 00482293
Server_Name : solaris
----------------------------------------
Suspect 1 of 1 :
Problem class : <b>alert.oracle.solaris.cpu.firmware.security</b>
Certainty : 100%
FRU
Status : Active
Location : "/SYS/MB"
Manufacturer : unknown
Name : unknown
Part_Number : unknown
Revision : unknown
Serial_Number : unknown
Chassis
Manufacturer : Oracle Corporation
Name : VirtualBox
Part_Number :
Serial_Number : 0
Resource
Status : Active
Response : No automated response available
<b>Impact : Oracle Solaris is not running with Spectre Vulnerability
Mitigation Enabled</b>
<b>Action : Update the CPU with Spectre capable microcode. Please refer to
the associated reference document at
http://support.oracle.com/msg/SUNOS-8000-LG for the latest
service procedures and policies regarding this diagnosis.</b>
</pre>
Robert Milkowskihttp://www.blogger.com/profile/13188276898947228975noreply@blogger.com1tag:blogger.com,1999:blog-9963485.post-47338752543795103392018-08-28T18:42:00.001+02:002018-08-28T18:42:08.637+02:00Solaris 11.4 GASolaris 11.4 GA is finally out and available for download. <br />
See <a href="https://docs.oracle.com/cd/E37838_01/html/E60974/index.html">What’s New</a>Robert Milkowskihttp://www.blogger.com/profile/13188276898947228975noreply@blogger.com0tag:blogger.com,1999:blog-9963485.post-8003884643416325762018-07-06T12:22:00.003+02:002018-07-06T12:22:32.981+02:00dumpadm -d none<div style="text-align: justify;">
Solaris 11.3 still doesn't support dumpadm -d none. This is useful in some scenarios, for example when troubleshooting failed AI installations when you try to restart it withouth rebooting. This will generally fail as it won't be able to destroy rpool as there is a dump device already configured there. </div>
<br />
There is a workaround though.<br />
<div style="text-align: justify;">
Edit <i>/etc/dumpadm.conf</i> file and comment out line containing <i>DUMPADM_DEVICE</i>, then run <i>dumpadm -u</i>.
This will unconfigure dump device entirely. Then just run <i>zpool destroy rpool</i> and now you can <i>svcadm clear auto-installer.</i>
</div>
Robert Milkowskihttp://www.blogger.com/profile/13188276898947228975noreply@blogger.com0