In the previous post I wrote about building Hangar Board, the flying-group app, with Codex doing the coding. Here is the usage so far:
In the previous post I wrote about building Hangar Board, the flying-group app, with Codex doing the coding. Here is the usage so far:
I've been having some more spare time recently. Since I have been using coding agents at work for some time now, and in recent months have switched mainly to Codex, I wanted to see whether I could write a fully functional app from scratch without writing a single line of code and without looking at the code whatsoever.
I own a share in an airplane. Our group uses a rather old piece of software for bookings. It does its job, but... it feels very much like old software. So the idea was simple: build something more modern for our group, but do not force everyone to move at once.
For the time being, bookings in the new app work as a wrapper around the existing booking service. The old system remains the source of truth, and anyone who wants to keep using it directly can still do so. The new app just gives us a nicer, group-specific front end and adds the things around bookings that we actually care about.
The app is called Hangar Board. At this point it is no longer just a toy experiment. It is fully working.
The main pieces are:
One of the most interesting parts of this experiment was that all of the third-party integration work was done by Codex. Aviation weather, aircraft tracking, maps, Dropbox, WhatsApp, email - Codex suggested which services to use, went and read the API documentation, and then wrote the integration code.
My role was basically to choose from the options it recommended. I did not read the API specs. I did not write the request/response handling. I did not wire the service layers. I just guided the product decisions and checked whether the resulting app behaved correctly.
Another part that still feels slightly magical is that Codex could test the UI on devices by controlling Xcode simulators. It would start a simulator, open a browser, navigate the app, take screenshots, inspect what was wrong, adjust the UI, and repeat.
This was especially useful for the checklist PWA and iPad layouts. I did not have to manually resize windows, copy screenshots around, or describe every alignment issue. Codex could see enough of the rendered result to iterate.
Codex also wrote the test suite. There are now over 800 tests, and Codex runs them every time a code change is made. That part matters a lot, because once the app grew beyond a few simple pages, manual checking would not have been enough.
For this work I have been exclusively using GPT-5.5 xhigh. The whole process has also been a lot of fun. Sometimes I was not even at my computer: I would use ChatGPT with remote Codex access, ask Codex to implement something, and then come back later to review the result. That still feels really cool.
The app is still deliberately small-scale. It uses SQLite and a dependency-light Python web app. It probably would not scale well for some larger deployment, but that is not the problem I am solving. There are six of us. For that, it works very well.
The main goal has been achieved: I did not write a single line of code, and I did not read the code. The app is fully working and working well.
Frankly, although I did expect Codex to succeed, I also expected that I would need to intervene in small ways here and there. I thought I would need to fix some code manually, read a few awkward parts, or step in when an integration got messy. That did not happen.
That said, I do think my technical background helped. I did not write the code, and I did not read the code, but knowing how software, backends, APIs, authentication, persistence, testing, and deployment usually fit together made it much easier to ask the right questions, judge trade-offs, and notice when something needed another iteration.
Still, it feels like we are now very close to the point where an AI system can generate a fully working service like this for someone without that kind of technical background. Not a toy demo, but a real, useful, integrated application built around the way a small group actually works.
When I think about it, just several months ago this was not possible. And yet now it already feels natural, almost mundane, and I find myself taking it for granted.
Truly marvelous times we live in.
On a low-latency Linux host, it is easy to assume that nohz_full (full dynticks / adaptive ticks) means "this CPU will not get ticks." In practice, tick behavior depends on both idle policy and runqueue state.
This post gives a brief summary of how this works on a Rocky Linux 9.x / 5.14.x kernel, with easy-to-reproduce examples.
The /api/private/cli family is useful, but the “standard”/structured passthrough style (where the CLI maps onto /api/private/cli/<command-path> and parameters become fields/query params) only covers a subset of CLI commands and behaviours.
POST /api/private/cli with a JSON body containing an input string. In effect, you hand ONTAP a whole CLI command line to execute. The trade-off is that you typically get CLI-style text output back (not nice JSON records), but it can still be extremely handy for one-off automation and troubleshooting.input)curl -u admin -k -X POST 'https://<cluster>/api/private/cli' \ -H 'accept: application/json' \ -H 'content-type: application/json' \ --data '{"input": "system node run -node * -command \"priv set diag; wafl scan speed\""}'Enter host password for user 'admin':{
"output": "2 entries were acted on.\n\nNode: XXX-01\nWarning: These diagnostic commands are for use by NetApp\n personnel only.\nWAFL scan speed is 1300\n\nNode: XXX-02\nWarning: These diagnostic commands are for use by NetApp\n personnel only.\nWAFL scan speed is 1300\n\n"
}
The -B option in bpftrace (none / line / full) now applies not only to terminal output but also when writing to files with -o.
This ensures the same predictable flushing behavior everywhere, making it easier to use tools like tail -f on trace files.
Change landed in PR #4637.
In early February 2023, I built a Slack bot to make OpenAI LLMs readily accessible within our company for experimentation. It wasn’t planned, budgeted, or part of any official initiative—just something I hacked together out of excitement for large language models. I called it HAL, after the computer in 2001: A Space Odyssey. What started as a fun experiment quickly became a core tool that many of us now use every single day. HAL originally ran on OpenAI’s GPT-3.5-turbo model, and since then we’ve been quick to adopt every major new model release as they’ve become available. After that initial development to get core functionality working and deployed to all employees, HAL has been extended organically and mostly in my spare time as a hobby—guided by user feedback, the release of new models and features, and whatever seemed fun or useful to implement.
Slack was a natural choice. It was already the company’s preferred communication platform, and by embedding HAL there:
Fun fact: our CEO, David, was one of the first to try HAL—and of course, one of his very first requests was to ask it to open the bay door. True to its 2001: A Space Odyssey roots, HAL politely refused.
The main way people interact with HAL is through direct messages. The following Slack features are also utilised by HAL:
One underrated feature is HAL’s ability to summarise recent discussions in a channel—perfect for catching up after being away.
While HAL started with OpenAI models, it was designed to be model-agnostic. Over time, we’ve run it against on-prem LLaMA, DeepSeek, and soon other models like Claude. HAL itself is written in Python and built on the Slack Bolt API, while the HAL CLI is written in Go for easy distribution as a single binary.
Currently:
HAL has a concept of plugins that add additional capabilities beyond standard chat. Some are enabled by default, while others are opt-in depending on user needs. Here are some of the main ones:
Other plugins have been more experimental, such as Wolfram Alpha or PagerDuty integration, but they’ve demonstrated just how flexible the architecture is.
Spock is HAL’s “specialist mode.” It uses the heaviest model available, with web + code execution, to tackle deep technical queries. HAL auto-routes to Spock when users ask about physics, biology, or advanced coding—but users can also explicitly request it.
Yes, the answers may take minutes, but they’re often worth the wait.
HAL can also help employees manage their virtual desktops. It allows users to perform actions like reboot, reset, and similar operations on their VDI workstations.
This feature has greatly reduced the number of callouts to our Helpdesk for requests to restart hung VDI workstations (researchers always find a way to hang theirs while experimenting). It’s also very easy for users to remember—no need to learn the name or syntax of yet another tool; you just ask HAL in whatever language you like, even from your mobile phone while away from your desk, and it handles it.
A humorous anecdote: early on, someone asked HAL if it could restart a specific application on their workstation. HAL responded that it couldn’t—but since it could restart the entire workstation (which would restart the application too), it went ahead and did just that. While the potential for this behaviour was known, it hasn’t actually happened for quite some time. Since that incident, the plugin was modified so that it now requires the user to confirm the requested action before it will be executed—HAL can no longer run the action without explicit confirmation by the user.
HAL can also search and process selected internal Confluence pages. This works in two ways:
Force-feeding technique – A given documentation dump is exposed to HAL as a plugin/function it can call. When triggered, the entire dataset (with some metadata and URLs) is preloaded into a model with a large context window, along with instructions and the user query. HAL is instructed when to use it and can retrieve or synthesize information based on the plugin's reply without polluting the main chat history.
Download URL approach – For certain topics, HAL is instructed to fetch relevant Confluence pages directly at query time, pulling them into the conversation context to answer questions or for analysis or summarisation.
Together these techniques make Confluence search surprisingly effective, even without embedding databases (which we'll most likely add in the future).
Despite usage growing, our actual cost of using HAL (API calls) has been going down. Model prices are dropping faster than usage is rising, and features like input token caching help further. Given the productivity boost HAL brings, the costs are negligible compared to the value delivered.
The LLM landscape is moving at lightning speed. On HAL’s roadmap:
HAL has recently been extended with several new features:
Anthropic/Claude models added, broadening the choice of LLMs available.
User customisations introduced, allowing individuals to add their own instructions—either per channel or globally.
On-call lookup plugin added, enabling users to query who is currently on-call from different internal teams.
Employee directory plugin introduced, so users can query HAL for the contact details of other employees.
While having an LLM assistant directly in Slack is extremely convenient, it also comes with significant limitations. Some of the more notable ones include:
Providing an LLM bot via Slack is always going to be constrained compared to a dedicated client. However, if your company already uses Slack, having the LLM available there is still very useful—even if for certain tasks it makes more sense to switch to another tool. In our case, many of us use both HAL in Slack and GitHub Copilot inside VS Code, depending on the situation. Sometimes one is more convenient, sometimes the other.
![]() |
| Our headquarters in London... :) |
$ torchrun --nproc_per_node 1 test2_text_completion.py \
--tokenizer_path tokenizer.model --max_seq_len 128 --max_batch_size 4 --ckpt_dir llama-2-7b/ \
--prompt "Hello everyone, I'm LLAMA-2"
...
Loaded in 4.32 seconds
Hello everyone, I'm LLAMA-2 000. I'm a new member here. I'm from China. I'm a girl, I'm 16. I like animals, especially lamas. I like to play games, especially the games on the internet. I like to make friends. I hope I can make
The latest version (0.27.8) of the pypi/openai library now supports Azure/DALLE-2. To ensure compatibility, set openai.api_version = '2023-06-01-preview'.
> Not sure what problem you're trying to solve and thus what exactly you
> need... but you can probably get something very very close with 1 or 2
> of clock_gettime(CLOCK_{BOOTTIME,MONOTONIC,REALTIME}) possibly in a
> triple vdso call sandwich and iterated a few times and picking the one
> with smallest delta between 1st and 3rd calls. And then assuming the
> avg of 1st and 3rd matches the 2nd.
> ie.
>
> 5 times do:
> t1[i] = clock_gettime(REALTIME)
> t2[i] = clock_gettime(BOOTTIME)
> t3[i] = clock_gettime(REALTIME)
>
> pick i so t3[i] - t1[i] is smallest
>
> t2[i] is near equivalent to (t1[i] + t3[i]) / 2
>
> which basically gives you a REAL to BOOT offset.
Let's quickly implement it:#include <time.h>
#include <stdio.h>
#include <stdint.h>
#define ITERATIONS 5
int main(int argc, char **argv) {
struct timespec ts1[ITERATIONS], ts2[ITERATIONS], ts3[ITERATIONS];
uint64_t t1, t2, t3, t4, t5, smallest_dt = 0;
int ret, i, smallest_dt_i;
for (i = 0; i < ITERATIONS; i++) {
ret = clock_gettime(CLOCK_REALTIME, &ts1[i]);
ret = clock_gettime(CLOCK_BOOTTIME, &ts2[i]);
ret = clock_gettime(CLOCK_REALTIME, &ts3[i]);
}
for (i = 0; i < ITERATIONS; i++) {
t1 = ts1[i].tv_sec * (uint64_t)1000000000 + ts1[i].tv_nsec;
t3 = ts3[i].tv_sec * (uint64_t)1000000000 + ts3[i].tv_nsec;
printf("i: %d dt: %lu\n", i, t3-t1);
if (!smallest_dt || ((t3 - t1) < smallest_dt)) {
smallest_dt = t3 - t1;
smallest_dt_i = i;
}
}
t1 = ts1[smallest_dt_i].tv_sec * (uint64_t)1000000000 + ts1[smallest_dt_i].tv_nsec;
t2 = ts2[smallest_dt_i].tv_sec * (uint64_t)1000000000 + ts2[smallest_dt_i].tv_nsec;
t3 = ts3[smallest_dt_i].tv_sec * (uint64_t)1000000000 + ts3[smallest_dt_i].tv_nsec;
t4 = (uint64_t)(t1+t3)/2;
t5 = t4 - t2;
printf("\n");
printf("t5=t4-t2: %lu\n", t5);
}The given blog post can be improved by making some modifications and clarifications. Below is the improved version of the post:
Siri is an amazing tool that can automate various tasks and make life much easier. However, sometimes it can be frustrating when it responds with the message "I found some web results, I sent them to your iPhone." To make Siri even more useful and fun, we can integrate Siri/Shortcuts with OpenAI/ChatGPT, which can work with HomePod too.
To get started, we need to define a variable with our OpenAI token. Then, we can use the Dictate Text action to convert speech to text and store the result in another variable. This allows Siri to understand our speech and perform the necessary actions.
Next, we need to set another variable with the parameters for the OpenAI query. These parameters include the text we want to generate, the model we want to use, and the temperature of the output. Temperature refers to the randomness of the generated text. Higher temperatures will result in more creative and unexpected output, but lower temperatures will produce more accurate and predictable text.
Once we have set the necessary variables, we can issue a REST call to the OpenAI API. This will send our query to the OpenAI server and retrieve the generated text. We can then use the Text action to speak the generated text aloud or display it on the screen.
It's important to note that using OpenAI/ChatGPT with Siri requires some programming skills and knowledge of APIs. However, with some practice and experimentation, anyone can create their own Siri shortcuts and integrate them with OpenAI. With this powerful combination, we can create a wide range of useful and entertaining applications, from generating creative writing prompts to answering trivia questions.
# for i in $(seq 1 3); do ./threads-loop-test-1 4 1000000; done Total time: 0.982799 Total time: 0.977269 Total time: 0.980996
# for i in $(seq 1 3); do bpftrace -qe 't:syscalls:sys_enter_newstat
/pid==cpid/{@[probe]=count();}'
-c "./threads-loop-test-1 4 1000000";
done
Total time: 1.192433
@[tracepoint:syscalls:sys_enter_newstat]: 4000019
Total time: 1.204695
@[tracepoint:syscalls:sys_enter_newstat]: 4000019
Total time: 1.143513
@[tracepoint:syscalls:sys_enter_newstat]: 4000019
# for i in $(seq 1 3); do strace -qfc -e trace=stat ./threads-loop-test-1 4 1000000; done Total time: 49.478656 % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 100.00 87.720184 21 4000019 18 stat ------ ----------- ----------- --------- --------- ---------------- 100.00 87.720184 21 4000019 18 total Total time: 49.336942 % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 100.00 87.463778 21 4000019 18 stat ------ ----------- ----------- --------- --------- ---------------- 100.00 87.463778 21 4000019 18 total Total time: 49.250562 % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 100.00 87.923629 21 4000019 18 stat ------ ----------- ----------- --------- --------- ---------------- 100.00 87.923629 21 4000019 18 total
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <time.h>
void *thread_handler(void *arg) {
int N = (long)arg;
struct stat file_stat;
while (N--) {
stat("/tmp", &file_stat);
}
pthread_exit(NULL);
}
int main(int argc, char *argv[]) {
struct timespec start, end;
int num_threads;
long N;
if (argc != 3) {
printf("Usage: %s \n", argv[0]);
exit(-1);
}
num_threads = atoi(argv[1]);
N = abs(atol(argv[2]));
pthread_t threads[num_threads];
int rc;
long t;
clock_gettime(CLOCK_MONOTONIC, &start);
for(t=0; t<num_threads; t++){
rc = pthread_create(&threads[t], NULL, thread_handler, (void *)N);
if (rc){
printf("ERROR; return code from pthread_create() is %d\n", rc);
exit(-1);
}
}
/* Wait for all threads to complete */
for(t=0; t<num_threads; t++) {
pthread_join(threads[t], NULL);
}
clock_gettime(CLOCK_MONOTONIC, &end);
double time_taken = (end.tv_sec - start.tv_sec) + (double)(end.tv_nsec - start.tv_nsec) / (double)1000000000;
printf("Total time: %lf\n", time_taken);
return 0;
}
Caused by: java.io.FileNotFoundException: /.../latest.link (Input/output error)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
...
$ cat Test.java
import java.io.*;
...
public class Test {
public static void main(String[] args) {
FileUtils fu = new FileUtils();
File dir = new File("/mnt/.../indexweights");
File ft = new File("/tmp/foo");
try {
ft = fu.resolveLinkFile(dir);
} catch (IOException e) {
System.out.println("IOException: " + e.getMessage());
e.printStackTrace();
} finally {
System.out.println("Destination path: " + ft.getAbsolutePath());
}
}
}
$ strace -tTfv -o /tmp/a -e trace=file,close,read java Test
Destination path: /mnt/...
$ grep -A2 latest.link /tmp/a
4422 15:33:38 open("/mnt/.../indexweights/latest.link", O_RDONLY) = 5 <0.069349>
4422 15:33:38 read(5, "...", 8192) = 11 <0.000010>
4422 15:33:38 close(5) = 0 <0.069149>
$ egrep ' open\(|open resume' .../XXX.out.20190625220935.26281
...
[pid 133595] 22:10:17 open("/mnt/.../latest.link", O_RDONLY <unfinished ...>
[pid 133595] 22:10:17 <... open resumed> ) = -1 EIO (Input/output error) <0.069362>
...
[pid 133595] 22:10:21 open("/mnt/.../latest.link", O_RDONLY <unfinished ...>
[pid 133595] 22:10:21 <... open resumed> ) = -1 EIO (Input/output error) <0.069204>
...
[pid 133595] 22:10:29 open("/mnt/.../latest.link", O_RDONLY <unfinished ...>
[pid 133595] 22:10:29 <... open resumed> ) = 192 <0.487662>
# tshark -r tcpdump.out.201906252205 -tad nfs ... 160040 2019-06-25 22:10:17 10.2.1.142 -> 10.2.6.11 NFS 242 V4 Call ACCESS FH: 0x18703b49, [Check: RD LU MD XT DL] 160044 2019-06-25 22:10:17 10.2.6.11 -> 10.2.1.142 NFS 222 V4 Reply (Call In 160040) ACCESS, [Access Denied: MD XT DL], [Allowed: RD LU] 160066 2019-06-25 22:10:17 10.2.1.142 -> 10.2.6.11 NFS 322 V4 Call OPEN DH: 0x18703b49/latest.link 160074 2019-06-25 22:10:17 10.2.6.11 -> 10.2.1.142 NFS 150 V4 Reply (Call In 160066) OPEN Status: NFS4ERR_EXPIRED 160793 2019-06-25 22:10:18 10.2.1.142 -> 10.2.6.11 NFS 322 V4 Call OPEN DH: 0x18703b49/latest.link 160799 2019-06-25 22:10:18 10.2.6.11 -> 10.2.1.142 NFS 150 V4 Reply (Call In 160793) OPEN Status: NFS4ERR_EXPIRED 161368 2019-06-25 22:10:19 10.2.1.142 -> 10.2.6.11 NFS 322 V4 Call OPEN DH: 0x18703b49/latest.link 161394 2019-06-25 22:10:19 10.2.6.11 -> 10.2.1.142 NFS 150 V4 Reply (Call In 161368) OPEN Status: NFS4ERR_EXPIRED 162036 2019-06-25 22:10:21 10.2.1.142 -> 10.2.6.11 NFS 234 V4 Call GETATTR FH: 0x7a1ffedc 162112 2019-06-25 22:10:21 10.2.6.11 -> 10.2.1.142 NFS 338 V4 Reply (Call In 162036) GETATTR 162114 2019-06-25 22:10:21 10.2.1.142 -> 10.2.6.11 NFS 322 V4 Call OPEN DH: 0x18703b49/latest.link 162194 2019-06-25 22:10:21 10.2.6.11 -> 10.2.1.142 NFS 150 V4 Reply (Call In 162114) OPEN Status: NFS4ERR_EXPIRED 162803 2019-06-25 22:10:22 10.2.1.142 -> 10.2.6.6 NFS 226 V4 Call RENEW CID: 0x8dbc 162811 2019-06-25 22:10:22 10.2.6.6 -> 10.2.1.142 NFS 182 V4 Reply (Call In 162803) RENEW Status: NFS4ERR_EXPIRED 162813 2019-06-25 22:10:22 10.2.1.142 -> 10.2.6.6 NFS 226 V4 Call RENEW CID: 0x8dbc 162825 2019-06-25 22:10:22 10.2.6.6 -> 10.2.1.142 NFS 182 V4 Reply (Call In 162813) RENEW Status: NFS4ERR_EXPIRED 162826 2019-06-25 22:10:22 10.2.1.142 -> 10.2.6.6 NFS 318 V4 Call SETCLIENTID 162850 2019-06-25 22:10:23 10.2.6.6 -> 10.2.1.142 NFS 198 V4 Reply (Call In 162826) SETCLIENTID 162851 2019-06-25 22:10:23 10.2.1.142 -> 10.2.6.6 NFS 234 V4 Call SETCLIENTID_CONFIRM 162858 2019-06-25 22:10:23 10.2.6.6 -> 10.2.1.142 NFS 182 V4 Reply (Call In 162851) SETCLIENTID_CONFIRM 163272 2019-06-25 22:10:24 10.2.1.142 -> 10.2.6.11 NFS 226 V4 Call RENEW CID: 0x0a7f 163302 2019-06-25 22:10:25 10.2.6.11 -> 10.2.1.142 NFS 182 V4 Reply (Call In 163272) RENEW Status: NFS4ERR_EXPIRED 163304 2019-06-25 22:10:25 10.2.1.142 -> 10.2.6.11 NFS 226 V4 Call RENEW CID: 0x0a7f 163332 2019-06-25 22:10:25 10.2.6.11 -> 10.2.1.142 NFS 182 V4 Reply (Call In 163304) RENEW Status: NFS4ERR_EXPIRED 163333 2019-06-25 22:10:25 10.2.1.142 -> 10.2.6.11 NFS 318 V4 Call SETCLIENTID 163343 2019-06-25 22:10:25 10.2.6.11 -> 10.2.1.142 NFS 198 V4 Reply (Call In 163333) SETCLIENTID 163344 2019-06-25 22:10:25 10.2.1.142 -> 10.2.6.11 NFS 234 V4 Call SETCLIENTID_CONFIRM 163382 2019-06-25 22:10:25 10.2.6.11 -> 10.2.1.142 NFS 182 V4 Reply (Call In 163344) SETCLIENTID_CONFIRM 164083 2019-06-25 22:10:27 10.2.1.142 -> 10.2.6.5 NFS 1606 V4 NULL Call 164109 2019-06-25 22:10:27 10.2.6.5 -> 10.2.1.142 NFS 302 V4 NULL Reply (Call In 164083) 164289 2019-06-25 22:10:28 10.2.1.142 -> 10.2.6.6 NFS 1606 V4 NULL Call 164301 2019-06-25 22:10:28 10.2.6.6 -> 10.2.1.142 NFS 302 V4 NULL Reply (Call In 164289) 164303 2019-06-25 22:10:28 10.2.1.142 -> 10.2.6.6 NFS 226 V4 Call RENEW CID: 0x9d10 164326 2019-06-25 22:10:28 10.2.6.6 -> 10.2.1.142 NFS 182 V4 Reply (Call In 164303) RENEW 164417 2019-06-25 22:10:28 10.2.1.142 -> 10.2.6.11 NFS 242 V4 Call ACCESS FH: 0x118824df, [Check: RD LU MD XT DL] 164427 2019-06-25 22:10:28 10.2.6.11 -> 10.2.1.142 NFS 222 V4 Reply (Call In 164417) ACCESS, [NOT Supported: MD XT DL], [Allowed: RD LU] 164429 2019-06-25 22:10:28 10.2.1.142 -> 10.2.6.11 NFS 234 V4 Call GETATTR FH: 0x7a1ffedc 164440 2019-06-25 22:10:28 10.2.6.11 -> 10.2.1.142 NFS 338 V4 Reply (Call In 164429) GETATTR 164441 2019-06-25 22:10:28 10.2.1.142 -> 10.2.6.11 NFS 322 V4 Call OPEN DH: 0x18703b49/latest.link 164458 2019-06-25 22:10:28 10.2.6.11 -> 10.2.1.142 NFS 466 V4 Reply (Call In 164441) OPEN StateID: 0xae70 164459 2019-06-25 22:10:28 10.2.1.142 -> 10.2.6.11 NFS 242 V4 Call OPEN_CONFIRM 164473 2019-06-25 22:10:29 10.2.6.11 -> 10.2.1.142 NFS 166 V4 Reply (Call In 164459) OPEN_CONFIRM 164474 2019-06-25 22:10:29 10.2.1.142 -> 10.2.6.11 NFS 250 V4 Call READ StateID: 0xa087 Offset: 0 Len: 11 164498 2019-06-25 22:10:29 10.2.6.11 -> 10.2.1.142 NFS 170 V4 Reply (Call In 164474) READ 164499 2019-06-25 22:10:29 10.2.1.142 -> 10.2.6.11 NFS 234 V4 Call GETATTR FH: 0xedefdae0 164519 2019-06-25 22:10:29 10.2.6.11 -> 10.2.1.142 NFS 338 V4 Reply (Call In 164499) GETATTR 164520 2019-06-25 22:10:29 10.2.1.142 -> 10.2.6.11 NFS 258 V4 Call CLOSE StateID: 0xa087 164545 2019-06-25 22:10:29 10.2.6.11 -> 10.2.1.142 NFS 230 V4 Reply (Call In 164520) CLOSE ...
$ sudo /var/tmp/syscall-open-stat-filename-regex.stp -g --suppress-time-limits -o /var/tmp/failed_stats_and_opens.log 'XXX.*zip' & [5] 21855
# grep '= -' /var/tmp/failed_stats_and_opens.log
Tue Nov 12 17:11:02 2019 java[16043] SyS_open("XXX") = -5 (EIO) : -5
Tue Nov 12 17:11:02 2019 java[16078] SyS_open("XXX") = -5 (EIO) : -5
...
# grep 16043 /var/tmp/failed_stats_and_opens.log
Tue Nov 12 17:11:02 2019 java[16043] SyS_newlstat("/mnt/.../XXX.zip") = 0 : 0
Tue Nov 12 17:11:02 2019 java[16043] SyS_newstat("/mnt/.../XXX.zip") = 0 : 0
Tue Nov 12 17:11:02 2019 java[16043] SyS_newstat("/mnt/.../XXX.zip") = 0 : 0
Tue Nov 12 17:11:02 2019 java[16043] SyS_newstat("/mnt/.../XXX.zip") = 0 : 0
Tue Nov 12 17:11:02 2019 java[16043] SyS_open("/mnt/.../XXX.zip") = -5 (EIO) : -5
# tcpdump -n -r /var/tmp/tcpdump.XXX.out ... 17:11:02.766890 IP 10.210.16.12.cisco-tdp > 10.210.6.6.nfs: Flags [P.], seq 75637865:75638065, ack 2022930725, win 12288, options [nop,nop,TS val 12657161 ecr 997162631], length 200: NFS request xid 2489184212 196 getattr fh 0,0/22 17:11:02.767043 IP 10.210.6.6.nfs > 10.210.16.12.cisco-tdp: Flags [P.], seq 2022930725:2022931065, ack 75638065, win 32806, options [nop,nop,TS val 997162631 ecr 12657161], length 340: NFS reply xid 2489184212 reply ok 336 getattr NON 4 ids 0/15 sz 0 17:11:02.769335 IP 10.210.16.12.cisco-tdp > 10.210.6.6.nfs: Flags [P.], seq 75638065:75638329, ack 2022931065, win 12288, options [nop,nop,TS val 12657164 ecr 997162631], length 264: NFS request xid 2505961428 260 getattr fh 0,0/22 17:11:02.769477 IP 10.210.6.6.nfs > 10.210.16.12.cisco-tdp: Flags [P.], seq 2022931065:2022931149, ack 75638329, win 32806, options [nop,nop,TS val 997162631 ecr 12657164], length 84: NFS reply xid 2505961428 reply ok 80 getattr ERROR: unk 10011 17:11:02.769544 IP 10.210.16.12.cisco-tdp > 10.210.6.6.nfs: Flags [P.], seq 75638329:75638593, ack 2022931149, win 12288, options [nop,nop,TS val 12657164 ecr 997162631], length 264: NFS request xid 2522738644 260 getattr fh 0,0/22 17:11:02.769638 IP 10.210.6.6.nfs > 10.210.16.12.cisco-tdp: Flags [P.], seq 2022931149:2022931233, ack 75638593, win 32806, options [nop,nop,TS val 997162631 ecr 12657164], length 84: NFS reply xid 2522738644 reply ok 80 getattr ERROR: unk 10011 17:11:02.809052 IP 10.210.16.12.cisco-tdp > 10.210.6.6.nfs: Flags [.], ack 2022931233, win 12288, options [nop,nop,TS val 12657204 ecr 997162631], length 0 17:11:02.821772 IP 10.210.16.12.cisco-tdp > 10.210.6.6.nfs: Flags [P.], seq 75638593:75638789, ack 2022931233, win 12288, options [nop,nop,TS val 12657216 ecr 997162631], length 196: NFS request xid 2539515860 192 getattr fh 0,0/22 17:11:02.822228 IP 10.210.6.6.nfs > 10.210.16.12.cisco-tdp: Flags [P.], seq 2022931233:2022931573, ack 75638789, win 32806, options [nop,nop,TS val 997162636 ecr 12657216], length 340: NFS reply xid 2539515860 reply ok 336 getattr NON 4 ids 0/15 sz 0 17:11:02.822261 IP 10.210.16.12.cisco-tdp > 10.210.6.6.nfs: Flags [.], ack 2022931573, win 12288, options [nop,nop,TS val 12657217 ecr 997162636], length 0 17:11:02.823327 IP 10.210.16.12.cisco-tdp > 10.210.6.6.nfs: Flags [P.], seq 75638789:75638957, ack 2022931573, win 12288, options [nop,nop,TS val 12657218 ecr 997162636], length 168: NFS request xid 2556293076 164 getattr fh 0,0/22 17:11:02.823436 IP 10.210.6.6.nfs > 10.210.16.12.cisco-tdp: Flags [P.], seq 2022931573:2022931857, ack 75638957, win 32806, options [nop,nop,TS val 997162636 ecr 12657218], length 284: NFS reply xid 2556293076 reply ok 280 getattr NON 2 ids 0/9 sz 0 ...
# tcpdump -n -r /var/tmp/tcpdump.YYY.out ... 17:10:08.447238 IP 10.210.16.12.841 > 10.210.6.5.nfs: Flags [.], ack 174377189, win 11471, options [nop,nop,TS val 12602842 ecr 997157199], length 0 17:11:08.645057 IP 10.210.16.12.841 > 10.210.6.5.nfs: Flags [.], ack 174377189, win 11471, options [nop,nop,TS val 12663040 ecr 997157199], length 0 17:11:08.645147 IP 10.210.6.5.nfs > 10.210.16.12.841: Flags [.], ack 314027332, win 32806, options [nop,nop,TS val 997163219 ecr 12602842], length 0 17:11:08.645173 IP 10.210.16.12.841 > 10.210.6.5.nfs: Flags [P.], seq 314027332:314027492, ack 174377189, win 11471, options [nop,nop,TS val 12663040 ecr 997163219], length 160: NFS request xid 398604749 156 getattr fh 0,28/0 17:11:08.645249 IP 10.210.6.5.nfs > 10.210.16.12.841: Flags [.], ack 314027492, win 32806, options [nop,nop,TS val 997163219 ecr 12663040], length 0 17:11:08.692687 IP 10.210.6.5.nfs > 10.210.16.12.841: Flags [P.], seq 174377189:174377305, ack 314027492, win 32806, options [nop,nop,TS val 997163223 ecr 12663040], length 116: NFS reply xid 398604749 reply ok 112 getattr ERROR: unk 24 17:11:08.692796 IP 10.210.16.12.841 > 10.210.6.5.nfs: Flags [P.], seq 314027492:314027652, ack 174377305, win 11471, options [nop,nop,TS val 12663087 ecr 997163223], length 160: NFS request xid 415381965 156 getattr fh 0,28/0 17:11:08.693123 IP 10.210.6.5.nfs > 10.210.16.12.841: Flags [P.], seq 174377305:174377421, ack 314027652, win 32806, options [nop,nop,TS val 997163223 ecr 12663087], length 116: NFS reply xid 415381965 reply ok 112 getattr ERROR: unk 24 17:11:08.693164 IP 10.210.16.12.841 > 10.210.6.5.nfs: Flags [P.], seq 314027652:314027916, ack 174377421, win 11471, options [nop,nop,TS val 12663088 ecr 997163223], length 264: NFS request xid 432159181 260 getattr fh 0,132/0 17:11:08.693592 IP 10.210.6.5.nfs > 10.210.16.12.841: Flags [P.], seq 174377421:174377553, ack 314027916, win 32806, options [nop,nop,TS val 997163223 ecr 12663088], length 132: NFS reply xid 432159181 reply ok 128 getattr ERROR: unk 40 17:11:08.693680 IP 10.210.16.12.841 > 10.210.6.5.nfs: Flags [P.], seq 314027916:314028084, ack 174377553, win 11471, options [nop,nop,TS val 12663088 ecr 997163223], length 168: NFS request xid 448936397 164 getattr fh 0,36/0 17:11:08.701988 IP 10.210.6.5.nfs > 10.210.16.12.841: Flags [P.], seq 174377553:174377669, ack 314028084, win 32806, options [nop,nop,TS val 997163224 ecr 12663088], length 116: NFS reply xid 448936397 reply ok 112 getattr ERROR: unk 24 17:11:08.702090 IP 10.210.6.5.63021 > 10.210.16.12.33663: Flags [S], seq 1851114974, win 32804, options [mss 1460,sackOK,TS val 997163224 ecr 0,nop,wscale 5], length 0 17:11:08.702122 IP 10.210.16.12.33663 > 10.210.6.5.63021: Flags [S.], seq 2641182587, ack 1851114975, win 28960, options [mss 1460,sackOK,TS val 12663097 ecr 997163224,nop,wscale 8], length 0 17:11:08.702206 IP 10.210.6.5.63021 > 10.210.16.12.33663: Flags [.], ack 1, win 32806, options [nop,nop,TS val 997163224 ecr 12663097], length 0 17:11:08.704470 IP 10.210.6.5.63021 > 10.210.16.12.33663: Flags [P.], seq 1:77, ack 1, win 32806, options [nop,nop,TS val 997163225 ecr 12663097], length 76 17:11:08.704487 IP 10.210.16.12.33663 > 10.210.6.5.63021: Flags [.], ack 77, win 114, options [nop,nop,TS val 12663099 ecr 997163225], length 0 17:11:08.704534 IP 10.210.16.12.33663 > 10.210.6.5.63021: Flags [P.], seq 1:29, ack 77, win 114, options [nop,nop,TS val 12663099 ecr 997163225], length 28 17:11:08.704620 IP 10.210.6.5.63021 > 10.210.16.12.33663: Flags [.], ack 29, win 32806, options [nop,nop,TS val 997163225 ecr 12663099], length 0 17:11:08.741063 IP 10.210.16.12.841 > 10.210.6.5.nfs: Flags [.], ack 174377669, win 11471, options [nop,nop,TS val 12663136 ecr 997163224], length 0 17:11:13.717133 IP 10.210.16.12.841 > 10.210.6.5.nfs: Flags [P.], seq 314028084:314028244, ack 174377669, win 11471, options [nop,nop,TS val 12668112 ecr 997163224], length 160: NFS request xid 465713613 156 getattr fh 0,28/0 17:11:13.717704 IP 10.210.6.5.nfs > 10.210.16.12.841: Flags [P.], seq 174377669:174377785, ack 314028244, win 32806, options [nop,nop,TS val 997163726 ecr 12668112], length 116: NFS reply xid 465713613 reply ok 112 getattr ERROR: unk 24 17:11:13.717715 IP 10.210.16.12.841 > 10.210.6.5.nfs: Flags [.], ack 174377785, win 11471, options [nop,nop,TS val 12668112 ecr 997163726], length 0 17:12:02.981295 IP 10.210.6.5.57869 > 10.210.16.12.33663: Flags [S], seq 2961022330, win 32804, options [mss 1460,sackOK,TS val 997168652 ecr 0,nop,wscale 5], length 0 ...
# dtrace -q -n nfsv4:::op-getattr-done,nfsv4:::op-readdir-done,nfsv4:::op-access-done,
nfsv4:::op-link-done,nfsv4:::op-open-done,nfsv4:::op-create-done,
nfsv4:::op-delegreturn-done,nfsv4:::op-verify-done,nfsv4:::op-getfh-done,
nfsv4:::op-readlink-done,nfsv4:::op-renew-done,nfsv4:::op-lookup-done,
nfsv4:::op-remove-done,nfsv4:::op-rename-done'
/args[2]->status != 0/
{
printf("%Y %s %s %d:%d %s %d\n", walltimestamp, probename, args[0]->ci_remote,
args[1]->noi_cred->cr_uid, args[1]->noi_cred->cr_gid,
args[1]->noi_curpath, args[2]->status);}'
-o /var/tmp/nfs-ops-with-errno.log
... 2019 Nov 12 17:10:42 op-lookup-done 10.250.3.29 10006:5000 /.../XXX 2 2019 Nov 12 17:10:41 op-renew-done 10.210.13.49 60001:60001 <unknown> 10022 2019 Nov 12 17:10:46 op-renew-done 10.210.13.49 60001:60001 <unknown> 10022 2019 Nov 12 17:10:46 op-lookup-done 10.250.3.29 10006:5000 <unknown> 2 2019 Nov 12 17:10:49 op-lookup-done 10.210.10.152 10006:5000 <unknown> 2 2019 Nov 12 17:10:50 op-lookup-done 10.250.3.29 10006:5000 /.../XXX 2 ... 2019 Nov 12 17:10:55 op-lookup-done 10.250.3.29 10006:5000 <unknown> 2 2019 Nov 12 17:10:56 op-renew-done 10.210.13.49 60001:60001 <unknown> 10022 2019 Nov 12 17:10:58 op-renew-done 10.210.136.15 60001:60001 <unknown> 10011 2019 Nov 12 17:10:59 op-renew-done 10.210.136.15 60001:60001 <unknown> 10011 2019 Nov 12 17:10:59 op-renew-done 10.210.136.15 60001:60001 <unknown> 10011 2019 Nov 12 17:10:58 op-renew-done 10.210.136.15 60001:60001 <unknown> 10011 ...
# ggrep -E '10011|10022' /usr/include/nfs/nfs4_kprot.h
NFS4ERR_EXPIRED = 10011,
NFS4ERR_STALE_CLIENTID = 10022,
# ggrep 'Nov 12' /var/tmp/nfs-ops-with-errno.log | ggrep 10.210.16.12 | ggrep op-renew 2019 Nov 12 15:10:45 op-renew-done 10.210.16.12 60001:60001 <unknown> 10011 2019 Nov 12 15:10:45 op-renew-done 10.210.16.12 60001:60001 <unknown> 10011 2019 Nov 12 15:40:57 op-renew-done 10.210.16.12 60001:60001 <unknown> 10011 2019 Nov 12 15:40:56 op-renew-done 10.210.16.12 60001:60001 <unknown> 10011 2019 Nov 12 16:04:58 op-renew-done 10.210.16.12 60001:60001 <unknown> 10011 2019 Nov 12 16:04:58 op-renew-done 10.210.16.12 60001:60001 <unknown> 10011 2019 Nov 12 16:10:58 op-renew-done 10.210.16.12 60001:60001 <unknown> 10011 2019 Nov 12 16:10:58 op-renew-done 10.210.16.12 60001:60001 <unknown> 10011 2019 Nov 12 16:40:29 op-renew-done 10.210.16.12 60001:60001 <unknown> 10011 2019 Nov 12 16:40:29 op-renew-done 10.210.16.12 60001:60001 <unknown> 10011 2019 Nov 12 17:11:08 op-renew-done 10.210.16.12 60001:60001 <unknown> 10011 2019 Nov 12 17:11:08 op-renew-done 10.210.16.12 60001:60001 <unknown> 10011 ...
2019 Nov 12 16:40:29 op-renew-done 10.210.16.12 60001:60001 <unknown> 10011 2019 Nov 12 16:40:29 op-renew-done 10.210.16.12 60001:60001 <unknown> 10011 2019 Nov 12 17:11:08 op-renew-done 10.210.16.12 60001:60001 <unknown> 10011 2019 Nov 12 17:11:08 op-renew-done 10.210.16.12 60001:60001 <unknown> 10011
102126 17:11:02.769335 10.210.16.12 10.210.6.6 NFS 330 V4 Call (Reply In 102127) PUTFH | OPEN DH: 0xcbbed605/XXX.zip | GETFH | ACCESS FH: 0x00000000, [Check: RD MD XT XE] | GETATTR 102127 17:11:02.769477 10.210.6.6 10.210.16.12 NFS 150 V4 Reply (Call In 102126) PUTFH | OPEN Status: NFS4ERR_EXPIRED 102128 17:11:02.769544 10.210.16.12 10.210.6.6 NFS 330 V4 Call (Reply In 102129) PUTFH | OPEN DH: 0xcbbed605/XXX.zip | GETFH | ACCESS FH: 0x00000000, [Check: RD MD XT XE] | GETATTR 102129 17:11:02.769638 10.210.6.6 10.210.16.12 NFS 150 V4 Reply (Call In 102128) PUTFH | OPEN Status: NFS4ERR_EXPIRED
102831 17:12:02.917119 10.210.16.12 10.210.6.6 NFS 226 V4 Call (Reply In 102832) RENEW CID: 0xab20 102832 17:12:02.964745 10.210.6.6 10.210.16.12 NFS 182 V4 Reply (Call In 102831) RENEW Status: NFS4ERR_EXPIRED 102834 17:12:02.967293 10.210.16.12 10.210.6.6 NFS 226 V4 Call (Reply In 102835) RENEW CID: 0xab20 102835 17:12:02.967972 10.210.6.6 10.210.16.12 NFS 182 V4 Reply (Call In 102834) RENEW Status: NFS4ERR_EXPIRED 102836 17:12:02.968021 10.210.16.12 10.210.6.6 NFS 330 V4 Call (Reply In 102837) SETCLIENTID 102837 17:12:02.968332 10.210.6.6 10.210.16.12 NFS 198 V4 Reply (Call In 102836) SETCLIENTID 102838 17:12:02.968365 10.210.16.12 10.210.6.6 NFS 234 V4 Call (Reply In 102839) SETCLIENTID_CONFIRM 102839 17:12:02.981174 10.210.6.6 10.210.16.12 NFS 182 V4 Reply (Call In 102838) SETCLIENTID_CONFIRM
102874 17:12:04.838647 10.210.16.12 10.210.6.6 NFS 330 V4 Call (Reply In 102875) PUTFH | OPEN DH: 0x6030bc46/XXX.zip | GETFH | ACCESS FH: 0x00000000, [Check: RD MD XT XE] | GETATTR 102875 17:12:04.838844 10.210.6.6 10.210.16.12 NFS 478 V4 Reply (Call In 102874) PUTFH | OPEN StateID: 0x9aaf | GETFH | ACCESS, [Access Denied: XE], [Allowed: RD MD XT] | GETATTR
101689 17:06:16.805156 10.210.16.12 10.210.6.6 NFS 226 V4 Call (Reply In 101691) RENEW CID: 0xab20 101691 17:06:16.838027 10.210.6.6 10.210.16.12 NFS 182 V4 Reply (Call In 101689) RENEW 101694 17:07:16.965131 10.210.16.12 10.210.6.6 NFS 226 V4 Call (Reply In 101697) RENEW CID: 0xab20 101697 17:07:16.997082 10.210.6.6 10.210.16.12 NFS 182 V4 Reply (Call In 101694) RENEW 101701 17:08:17.125137 10.210.16.12 10.210.6.6 NFS 226 V4 Call (Reply In 101703) RENEW CID: 0xab20 101703 17:08:17.157783 10.210.6.6 10.210.16.12 NFS 182 V4 Reply (Call In 101701) RENEW 101706 17:09:17.285124 10.210.16.12 10.210.6.6 NFS 226 V4 Call (Reply In 101709) RENEW CID: 0xab20 101709 17:09:17.320893 10.210.6.6 10.210.16.12 NFS 182 V4 Reply (Call In 101706) RENEW ... no more RENEW operations until: 102831 17:12:02.917119 10.210.16.12 10.210.6.6 NFS 226 V4 Call (Reply In 102832) RENEW CID: 0xab20
# tcpdump -i aggr1 -w /var/tmp/tcpdump.20191119.out host 10.210.16.12 &
# dtrace -q -n fbt::rfs4_lease_expired:entry' {self->cp=args[0];}'
-n fbt::rfs4_lease_expired:return'
/arg1 && self->cp/
{this->s=(struct sockaddr_in *)&self->cp->rc_addr;
printf("%Y %s client_ip: %s client_id: %x forced_expire: %d last_access: %u dt: %d rfs4_lease_time: %d ret: %d\n",
walltimestamp, probefunc, inet_ntop(this->s->sin_family, (void *)&this->s->sin_addr),
self->cp->rc_clientid, self->cp->rc_forced_expire, self->cp->rc_last_access,
(walltimestamp/1000000000)-self->cp->rc_last_access, `rfs4_lease_time, arg1);
self->cp=0;stack()}'
-n fbt::rfs4_lease_expired:return'/self->cp/{self->cp=0;}' -o /var/tmp/rfs4_lease_expired_failed.out &
# dtrace -q -n nfsv4:::op-renew-start'
{self->t=timestamp;
printf("%Y %s client_id: 0x%x noi_xid: %u %s %d:%d %s\n", walltimestamp, probename, args[2]->clientid,
args[1]->noi_xid, args[0]->ci_remote, args[1]->noi_cred->cr_uid, args[1]->noi_cred->cr_gid, args[1]->noi_curpath);}'
-n nfsv4:::op-renew-done'
/self->t/
{printf("%Y %s noi_xid: %u %s %s %d:%d %s ret: %d took: %d seconds\n", walltimestamp, probename,
args[1]->noi_xid, args[0]->ci_remote, args[0]->ci_local, args[1]->noi_cred->cr_uid,
args[1]->noi_cred->cr_gid, args[1]->noi_curpath, args[2]->status,
(timestamp-self->t)/1000000000);
self->t=0;}'
-o /var/tmp/nfs4-renew.log &
# cat /var/tmp/ss
#!/bin/bash
dtrace -q -n nfsv4:::op-open-downgrade-start,nfsv4:::op-secinfo-start,nfsv4:::op-readdir-start,nfsv4:::op-setclientid-start,
nfsv4:::op-remove-start,nfsv4:::cb-recall-start,nfsv4:::op-rename-start,nfsv4:::op-commit-start,nfsv4:::op-savefh-start,
nfsv4:::op-delegpurge-start,nfsv4:::op-openattr-start,nfsv4:::op-delegreturn-start,nfsv4:::op-putpubfh-start,
nfsv4:::op-nverify-start,nfsv4:::op-verify-start,nfsv4:::op-setattr-start,nfsv4:::op-getattr-start,
nfsv4:::op-open-confirm-start,nfsv4:::null-start,nfsv4:::op-readlink-start,nfsv4:::op-lockt-start,nfsv4:::op-locku-start,
nfsv4:::op-putrootfh-start,nfsv4:::op-close-start,nfsv4:::op-access-start,nfsv4:::op-getfh-start,nfsv4:::op-read-start,
nfsv4:::op-lock-start,nfsv4:::op-open-start,nfsv4:::op-link-start,nfsv4:::op-renew-start,nfsv4:::op-write-start,
nfsv4:::op-restorefh-start,nfsv4:::op-lookupp-start,nfsv4:::op-release-lockowner-start,nfsv4:::compound-start,
nfsv4:::op-putfh-start,nfsv4:::op-setclientid-confirm-start,nfsv4:::op-lookup-start,nfsv4:::op-create-start'
/args[0]->ci_remote=="10.210.16.12"/
{self->t=timestamp;
printf("%Y %s %s %s\n", walltimestamp, probename, args[0]->ci_remote, args[0]->ci_local);}' \
-n nfsv4:::op-open-downgrade-done,nfsv4:::op-readdir-done,nfsv4:::op-access-done,nfsv4:::op-lock-done,nfsv4:::op-link-done,
nfsv4:::op-setclientid-done,nfsv4:::op-open-confirm-done,nfsv4:::op-open-done,nfsv4:::op-create-done,
nfsv4:::op-secinfo-done,nfsv4:::op-locku-done,nfsv4:::op-close-done,nfsv4:::op-write-done,
nfsv4:::op-delegreturn-done,nfsv4:::op-getattr-done,nfsv4:::op-verify-done,nfsv4:::op-getfh-done,
nfsv4:::op-putfh-done,nfsv4:::op-nverify-done,nfsv4:::op-readlink-done,nfsv4:::op-renew-done,
nfsv4:::op-savefh-done,nfsv4:::op-read-done,nfsv4:::compound-done,nfsv4:::cb-recall-done,
nfsv4:::op-lookup-done,nfsv4:::op-restorefh-done,nfsv4:::op-release-lockowner-done,nfsv4:::op-lookupp-done,
nfsv4:::op-openattr-done,nfsv4:::op-putpubfh-done,nfsv4:::op-putrootfh-done,
nfsv4:::op-setclientid-confirm-done,nfsv4:::op-remove-done,nfsv4:::op-rename-done,nfsv4:::op-commit-done,
nfsv4:::op-delegpurge-done,nfsv4:::op-setattr-done,nfsv4:::op-lockt-done'
/self->t/
{printf("%Y %s %s %s took: %d ret: %d\n", walltimestamp, probename, args[0]->ci_remote, args[0]->ci_local,
(timestamp-self->t)/1000000000, args[2]->status);
self->t=0;}' \
-n nfsv4:::null-done'/self->t/
{printf("%Y %s %s %s took: %d\n", walltimestamp, probename, args[0]->ci_remote, args[0]->ci_local,
(timestamp-self->t)/1000000000);
self->t=0;}' \
-o /var/tmp/nfs4_all_ops.log
# /var/tmp/ss &
# jobs
[2] Running tcpdump -i aggr1 -w /var/tmp/tcpdump.20191119.out host 10.210.16.12 &
[3] Running /var/tmp/ss &
[4]- Running dtrace -q -n nfsv4:::op-renew-start'{self->t=timestamp; printf("%Y %s client_id: 0x%x noi_xid: %u %s %d:%d %s\n", walltimestamp, probename, args[2]->clientid, args[1]->noi_xid, args[0]->ci_remote, args[1]->noi_cred->cr_uid, args[1]->noi_cred->cr_gid, args[1]->noi_curpath);}' -n nfsv4:::op-renew-done'/self->t/{printf("%Y %s noi_xid: %u %s %s %d:%d %s ret: %d took: %d seconds\n", walltimestamp, probename, args[1]->noi_xid, args[0]->ci_remote, args[0]->ci_local, args[1]->noi_cred->cr_uid, args[1]->noi_cred->cr_gid, args[1]->noi_curpath, args[2]->status, (timestamp-self->t)/1000000000);self->t=0;}' -o /var/tmp/nfs4-renew.log & (wd: /var/log/smblog)
[5]+ Running dtrace -q -n fbt::rfs4_lease_expired:entry'{self->cp=args[0];}' -n fbt::rfs4_lease_expired:return'/arg1 && self->cp/{this->s=(struct sockaddr_in *)&self->cp->rc_addr; printf("%Y %s client_ip: %s client_id: %x forced_expire: %d last_access: %u dt: %d rfs4_lease_time: %d ret: %d\n", walltimestamp, probefunc, inet_ntop(this->s->sin_family, (void *)&this->s->sin_addr), self->cp->rc_clientid, self->cp->rc_forced_expire, self->cp->rc_last_access, (walltimestamp/1000000000)-self->cp->rc_last_access, `rfs4_lease_time, arg1);self->cp=0;stack()}' -n fbt::rfs4_lease_expired:return'/self->cp/{self->cp=0;}' -o /var/tmp/rfs4_lease_expired_failed.out &
# grep "op-renew-done.*10.210.16.12" /var/tmp/nfs4-renew.log | less ... 2019 Nov 19 17:30:01 op-renew-done noi_xid: 807596244 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds 2019 Nov 19 17:31:01 op-renew-done noi_xid: 824373460 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds 2019 Nov 19 17:31:01 op-renew-done noi_xid: 2357017293 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 0 took: 0 seconds 2019 Nov 19 17:32:02 op-renew-done noi_xid: 841150676 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds 2019 Nov 19 17:32:02 op-renew-done noi_xid: 2373794509 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 0 took: 0 seconds 2019 Nov 19 17:33:02 op-renew-done noi_xid: 857927892 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds 2019 Nov 19 17:33:02 op-renew-done noi_xid: 2390571725 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 0 took: 0 seconds 2019 Nov 19 17:34:02 op-renew-done noi_xid: 2407348941 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 0 took: 0 seconds 2019 Nov 19 17:34:02 op-renew-done noi_xid: 874705108 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds 2019 Nov 19 17:35:02 op-renew-done noi_xid: 2424126157 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 0 took: 0 seconds 2019 Nov 19 17:35:02 op-renew-done noi_xid: 891482324 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds 2019 Nov 19 17:36:02 op-renew-done noi_xid: 2440903373 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 0 took: 0 seconds 2019 Nov 19 17:36:02 op-renew-done noi_xid: 908259540 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds 2019 Nov 19 17:37:02 op-renew-done noi_xid: 925036756 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds 2019 Nov 19 17:37:02 op-renew-done noi_xid: 2457680589 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 0 took: 0 seconds 2019 Nov 19 17:38:02 op-renew-done noi_xid: 2474457805 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 0 took: 0 seconds 2019 Nov 19 17:38:02 op-renew-done noi_xid: 941813972 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds 2019 Nov 19 17:39:03 op-renew-done noi_xid: 958591188 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds 2019 Nov 19 17:39:03 op-renew-done noi_xid: 2491235021 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 0 took: 0 seconds 2019 Nov 19 17:40:57 op-renew-done noi_xid: 796801741 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds 2019 Nov 19 17:40:57 op-renew-done noi_xid: 813578957 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds 2019 Nov 19 17:40:57 op-renew-done noi_xid: 3491950804 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 10011 took: 0 seconds 2019 Nov 19 17:40:57 op-renew-done noi_xid: 3508728020 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 10011 took: 0 seconds 2019 Nov 19 17:41:02 op-renew-done noi_xid: 863910605 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 0 took: 0 seconds 2019 Nov 19 17:41:02 op-renew-done noi_xid: 3559059668 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds ...
# less /var/tmp/nfs4_all_ops.log ... 2019 Nov 19 17:38:02 op-renew-start 10.210.16.12 10.210.6.6 2019 Nov 19 17:38:02 op-renew-done 10.210.16.12 10.210.6.6 took: 0 ret: 0 2019 Nov 19 17:39:03 compound-start 10.210.16.12 10.210.6.6 2019 Nov 19 17:39:03 op-renew-start 10.210.16.12 10.210.6.6 2019 Nov 19 17:39:03 op-renew-done 10.210.16.12 10.210.6.6 took: 0 ret: 0 2019 Nov 19 17:39:03 compound-start 10.210.16.12 10.210.6.5 2019 Nov 19 17:39:03 op-renew-start 10.210.16.12 10.210.6.5 2019 Nov 19 17:39:03 op-renew-done 10.210.16.12 10.210.6.5 took: 0 ret: 0 ... many getattr, access, etc. operations 2019 Nov 19 17:40:57 op-renew-start 10.210.16.12 10.210.6.5 2019 Nov 19 17:40:57 op-renew-done 10.210.16.12 10.210.6.5 took: 0 ret: 10011 2019 Nov 19 17:40:57 compound-start 10.210.16.12 10.210.6.5 2019 Nov 19 17:40:57 op-renew-start 10.210.16.12 10.210.6.5 2019 Nov 19 17:40:57 op-renew-done 10.210.16.12 10.210.6.5 took: 0 ret: 10011 2019 Nov 19 17:40:57 compound-start 10.210.16.12 10.210.6.6 2019 Nov 19 17:40:57 op-setclientid-confirm-start 10.210.16.12 10.210.6.6 2019 Nov 19 17:40:57 compound-start 10.210.16.12 10.210.6.5 2019 Nov 19 17:40:57 op-setclientid-confirm-start 10.210.16.12 10.210.6.5 2019 Nov 19 17:40:57 op-setclientid-confirm-done 10.210.16.12 10.210.6.5 took: 0 ret: 0 2019 Nov 19 17:40:57 compound-start 10.210.16.12 10.210.6.6 2019 Nov 19 17:40:57 op-renew-start 10.210.16.12 10.210.6.6 2019 Nov 19 17:40:57 op-renew-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011 2019 Nov 19 17:40:57 compound-start 10.210.16.12 10.210.6.6 2019 Nov 19 17:40:57 op-renew-start 10.210.16.12 10.210.6.6 2019 Nov 19 17:40:57 op-renew-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011 2019 Nov 19 17:40:57 compound-start 10.210.16.12 10.210.6.6 2019 Nov 19 17:40:57 op-setclientid-start 10.210.16.12 10.210.6.6 2019 Nov 19 17:40:57 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0 2019 Nov 19 17:40:57 op-setclientid-confirm-done 10.210.16.12 10.210.6.6 took: 0 ret: 0 2019 Nov 19 17:40:57 compound-start 10.210.16.12 10.210.6.5 2019 Nov 19 17:40:57 op-setclientid-start 10.210.16.12 10.210.6.5 2019 Nov 19 17:40:57 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0 2019 Nov 19 17:41:02 compound-start 10.210.16.12 10.210.6.5 2019 Nov 19 17:41:02 op-renew-start 10.210.16.12 10.210.6.5 2019 Nov 19 17:41:02 op-renew-done 10.210.16.12 10.210.6.5 took: 0 ret: 0 2019 Nov 19 17:41:02 compound-start 10.210.16.12 10.210.6.6 2019 Nov 19 17:41:02 op-renew-start 10.210.16.12 10.210.6.6 2019 Nov 19 17:41:02 op-renew-done 10.210.16.12 10.210.6.6 took: 0 ret: 0 2019 Nov 19 17:42:03 compound-start 10.210.16.12 10.210.6.5 2019 Nov 19 17:42:03 op-renew-start 10.210.16.12 10.210.6.5 2019 Nov 19 17:42:03 op-renew-done 10.210.16.12 10.210.6.5 took: 0 ret: 0 ...
# ggrep op-setclientid-done /var/tmp/nfs4_all_ops.log 2019 Nov 19 13:11:00 op-setclientid-done 10.210.16.12 took: 0 ret: 0 2019 Nov 19 13:40:55 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0 2019 Nov 19 14:40:56 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0 2019 Nov 19 15:11:05 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0 2019 Nov 19 15:40:54 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0 2019 Nov 19 15:40:54 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0 2019 Nov 19 16:40:48 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0 2019 Nov 19 16:40:48 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0 2019 Nov 19 17:40:57 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0 2019 Nov 19 17:40:57 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0 2019 Nov 19 18:40:55 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0 2019 Nov 19 18:40:55 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0 2019 Nov 19 19:40:57 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0 2019 Nov 19 19:40:57 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0 2019 Nov 19 20:11:03 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0 2019 Nov 19 20:11:03 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0 2019 Nov 19 20:40:51 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0 2019 Nov 19 20:40:51 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0 2019 Nov 19 21:10:49 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0 2019 Nov 19 21:10:49 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0 2019 Nov 19 22:40:59 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0 2019 Nov 19 22:40:59 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0 2019 Nov 19 23:11:08 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0 2019 Nov 19 23:11:08 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0 2019 Nov 19 23:40:58 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0 2019 Nov 19 23:40:58 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0 2019 Nov 20 00:41:06 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0 2019 Nov 20 00:41:06 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0 2019 Nov 20 01:11:02 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0 2019 Nov 20 01:11:02 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0 2019 Nov 20 02:10:54 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0 2019 Nov 20 02:10:54 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0 2019 Nov 20 03:10:59 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0 2019 Nov 20 03:10:59 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0 2019 Nov 20 03:41:06 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0 2019 Nov 20 03:41:06 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0 2019 Nov 20 04:11:15 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0 2019 Nov 20 04:11:15 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0 2019 Nov 20 04:41:00 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0 2019 Nov 20 04:41:00 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0 2019 Nov 20 05:11:03 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0 2019 Nov 20 05:11:03 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0 2019 Nov 20 05:40:59 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0 2019 Nov 20 05:40:59 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0 2019 Nov 20 06:11:09 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0 2019 Nov 20 06:11:09 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0 2019 Nov 20 06:41:05 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0 2019 Nov 20 06:41:05 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0 2019 Nov 20 07:11:05 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0 2019 Nov 20 07:11:05 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0 2019 Nov 20 07:40:58 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0 2019 Nov 20 07:40:58 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0 2019 Nov 20 08:10:49 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0 2019 Nov 20 08:10:49 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0 2019 Nov 20 10:41:18 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0 2019 Nov 20 10:41:18 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0 2019 Nov 20 11:11:23 op-setclientid-done 10.210.16.12 10.210.6.5 took: 0 ret: 0 2019 Nov 20 11:11:23 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
# grep "op-renew-done.*10.210.16.12.*10011" /var/tmp/nfs4-renew.log ... 2019 Nov 19 13:11:00 op-renew-done noi_xid: 3380034253 10.210.16.12 60001:60001 <unknown> ret: 10011 took: 0 seconds 2019 Nov 19 13:11:00 op-renew-done noi_xid: 3363257037 10.210.16.12 60001:60001 <unknown> ret: 10011 took: 0 seconds 2019 Nov 19 13:40:55 op-renew-done noi_xid: 2205694669 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds 2019 Nov 19 13:40:55 op-renew-done noi_xid: 2222471885 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds 2019 Nov 19 14:40:56 op-renew-done noi_xid: 3866704589 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds 2019 Nov 19 14:40:56 op-renew-done noi_xid: 3883481805 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds 2019 Nov 19 15:11:05 op-renew-done noi_xid: 2709142221 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds 2019 Nov 19 15:11:05 op-renew-done noi_xid: 2725919437 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds 2019 Nov 19 15:40:54 op-renew-done noi_xid: 1232812749 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds 2019 Nov 19 15:40:54 op-renew-done noi_xid: 1249589965 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds 2019 Nov 19 15:40:54 op-renew-done noi_xid: 4196397268 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 10011 took: 0 seconds 2019 Nov 19 15:40:54 op-renew-done noi_xid: 4213174484 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 10011 took: 0 seconds 2019 Nov 19 16:40:48 op-renew-done noi_xid: 1713500372 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 10011 took: 0 seconds 2019 Nov 19 16:40:48 op-renew-done noi_xid: 1696723156 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 10011 took: 0 seconds 2019 Nov 19 16:40:48 op-renew-done noi_xid: 3162258125 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds 2019 Nov 19 16:40:48 op-renew-done noi_xid: 3179035341 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds 2019 Nov 19 17:40:57 op-renew-done noi_xid: 796801741 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds 2019 Nov 19 17:40:57 op-renew-done noi_xid: 813578957 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds 2019 Nov 19 17:40:57 op-renew-done noi_xid: 3491950804 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 10011 took: 0 seconds 2019 Nov 19 17:40:57 op-renew-done noi_xid: 3508728020 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 10011 took: 0 seconds 2019 Nov 19 18:40:55 op-renew-done noi_xid: 2407480013 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds 2019 Nov 19 18:40:55 op-renew-done noi_xid: 2424257229 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds 2019 Nov 19 18:40:55 op-renew-done noi_xid: 690286804 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 10011 took: 0 seconds 2019 Nov 19 18:40:55 op-renew-done noi_xid: 673509588 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 10011 took: 0 seconds 2019 Nov 19 19:40:57 op-renew-done noi_xid: 4018158285 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds 2019 Nov 19 19:40:57 op-renew-done noi_xid: 2149970132 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 10011 took: 0 seconds 2019 Nov 19 19:40:57 op-renew-done noi_xid: 2166747348 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 10011 took: 0 seconds 2019 Nov 19 19:40:57 op-renew-done noi_xid: 4034935501 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds ...
# less /var/tmp/failed_stats_and_opens.log
...
Wed Nov 20 16:41:16 2019 java[10562] uid: 10203 SyS_newlstat("/mnt/.../XXX.zip") = 0 : 0
Wed Nov 20 16:41:16 2019 java[10562] uid: 10203 SyS_newstat("/mnt/.../XXX.zip") = 0 : 0
Wed Nov 20 16:41:16 2019 java[10562] uid: 10203 SyS_newstat("/mnt/.../XXX.zip") = 0 : 0
Wed Nov 20 16:41:16 2019 java[10562] uid: 10203 SyS_newstat("/mnt/.../XXX.zip") = 0 : 0
Wed Nov 20 16:41:16 2019 java[10562] uid: 10203 SyS_open("/mnt/.../.zip") = -5 (EIO) : -5
...
Nov 20 16:41:51 xxx kernel: [702405.312031] NFS: open file(.../XXX.zip)
Nov 20 16:41:51 xxx kernel: [702405.312033] --> nfs_put_client({23})
Nov 20 16:41:51 xxx kernel: [702405.312051] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=1024
Nov 20 16:41:51 xxx kernel: [702405.312053] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
Nov 20 16:41:51 xxx kernel: [702405.312212] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
Nov 20 16:41:51 xxx kernel: [702405.312227] nfs4_map_errors could not handle NFSv4 error 10011
Nov 20 16:41:51 xxx kernel: [702405.312233] NFS: atomic_open(0:109/65275), XXX.zip
Nov 20 16:41:51 xxx kernel: [702405.312268] --> nfs_put_client({23})
Nov 20 16:41:51 xxx kernel: [702405.312297] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=1024
Nov 20 16:41:51 xxx kernel: [702405.312299] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
Nov 20 16:41:51 xxx kernel: [702405.312468] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
Nov 20 16:41:51 xxx kernel: [702405.312478] nfs4_map_errors could not handle NFSv4 error 10011
# grep '20 16:[34].*-done.*10.210.6.6' /var/tmp/nfs4_all_ops.log | less ... 2019 Nov 20 16:41:51 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011 ... 2019 Nov 20 16:41:51 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011 ...
# ggrep "op-renew-done.*10.210.16.12 10.210.6.6" /var/tmp/nfs4-renew.log | ggrep '20 16:[234]' 2019 Nov 20 16:20:46 op-renew-done noi_xid: 2640048340 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds 2019 Nov 20 16:21:46 op-renew-done noi_xid: 2656825556 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds 2019 Nov 20 16:22:46 op-renew-done noi_xid: 2673602772 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds 2019 Nov 20 16:23:46 op-renew-done noi_xid: 2690379988 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds 2019 Nov 20 16:26:18 op-renew-done noi_xid: 3479630036 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds 2019 Nov 20 16:27:19 op-renew-done noi_xid: 3496407252 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds 2019 Nov 20 16:28:19 op-renew-done noi_xid: 3513184468 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds 2019 Nov 20 16:29:19 op-renew-done noi_xid: 3529961684 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds 2019 Nov 20 16:30:19 op-renew-done noi_xid: 3546738900 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds 2019 Nov 20 16:31:19 op-renew-done noi_xid: 3563516116 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds 2019 Nov 20 16:32:19 op-renew-done noi_xid: 3580293332 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds 2019 Nov 20 16:33:19 op-renew-done noi_xid: 3597070548 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds 2019 Nov 20 16:34:20 op-renew-done noi_xid: 3613847764 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds 2019 Nov 20 16:35:20 op-renew-done noi_xid: 3630624980 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds 2019 Nov 20 16:36:20 op-renew-done noi_xid: 3647402196 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds 2019 Nov 20 16:37:20 op-renew-done noi_xid: 3664179412 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds 2019 Nov 20 16:38:20 op-renew-done noi_xid: 3680956628 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds 2019 Nov 20 16:42:16 op-renew-done noi_xid: 3932811476 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 10011 took: 0 seconds 2019 Nov 20 16:42:16 op-renew-done noi_xid: 3949588692 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 10011 took: 0 seconds 2019 Nov 20 16:44:46 op-renew-done noi_xid: 1438445780 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds 2019 Nov 20 16:45:46 op-renew-done noi_xid: 1455222996 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds 2019 Nov 20 16:46:46 op-renew-done noi_xid: 1472000212 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds 2019 Nov 20 16:47:46 op-renew-done noi_xid: 1488777428 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds 2019 Nov 20 16:48:46 op-renew-done noi_xid: 1505554644 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds 2019 Nov 20 16:49:46 op-renew-done noi_xid: 1522331860 10.210.16.12 10.210.6.6 60001:60001 <unknown> ret: 0 took: 0 seconds # grep '20 1[56]:.*setclie.*10.210.6.6' /var/tmp/nfs4_all_ops.log 2019 Nov 20 15:11:09 op-setclientid-start 10.210.16.12 10.210.6.6 2019 Nov 20 15:11:09 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0 2019 Nov 20 15:11:09 op-setclientid-confirm-start 10.210.16.12 10.210.6.6 2019 Nov 20 15:11:09 op-setclientid-confirm-done 10.210.16.12 10.210.6.6 took: 0 ret: 0 2019 Nov 20 15:40:49 op-setclientid-start 10.210.16.12 10.210.6.6 2019 Nov 20 15:40:49 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0 2019 Nov 20 15:40:49 op-setclientid-confirm-start 10.210.16.12 10.210.6.6 2019 Nov 20 15:40:49 op-setclientid-confirm-done 10.210.16.12 10.210.6.6 took: 0 ret: 0 2019 Nov 20 16:42:16 op-setclientid-start 10.210.16.12 10.210.6.6 2019 Nov 20 16:42:16 op-setclientid-done 10.210.16.12 10.210.6.6 took: 0 ret: 0 2019 Nov 20 16:42:16 op-setclientid-confirm-start 10.210.16.12 10.210.6.6 2019 Nov 20 16:42:16 op-setclientid-confirm-done 10.210.16.12 10.210.6.6 took: 0 ret: 0
Nov 20 16:42:16 xxx kernel: [702430.126654] nfs4_renew_state: start
Nov 20 16:42:16 xxx kernel: [702430.126677] nfs4_renew_state: done
Nov 20 16:42:16 xxx kernel: [702430.245186] nfs4_schedule_lease_recovery: scheduling lease recovery for server xxx
Nov 20 16:42:16 xxx kernel: [702430.247174] nfs4_schedule_state_renewal: requeueing work. Lease period = 30
Nov 20 16:42:16 xxx kernel: [702430.247177] --> nfs_put_client({24})
Nov 20 16:42:16 xxx kernel: [702430.247817] nfs4_recovery_handle_error: handled error -10011 for server xxx
Nov 20 16:42:16 xxx kernel: [702430.247821] NFS call setclientid auth=RPCSEC_GSS, 'Linux NFSv4.0 xxx/10.210.6.6'
Nov 20 16:42:16 xxx kernel: [702430.248233] NFS reply setclientid: 0
Nov 20 16:42:16 xxx kernel: [702430.248235] NFS call setclientid_confirm auth=RPCSEC_GSS, (client ID ad23bf2032c90f)
Nov 20 16:42:16 xxx kernel: [702430.257473] NFS reply setclientid_confirm: 0
# grep '20 1[56]:.*open-done.*10.210.6.6' /var/tmp/nfs4_all_ops.log | grep 10011 2019 Nov 20 16:40:36 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011 2019 Nov 20 16:40:36 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011 2019 Nov 20 16:41:15 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011 2019 Nov 20 16:41:15 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011 2019 Nov 20 16:41:16 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011 2019 Nov 20 16:41:16 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011 2019 Nov 20 16:41:17 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011 2019 Nov 20 16:41:16 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011 2019 Nov 20 16:41:17 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011 2019 Nov 20 16:41:16 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011 2019 Nov 20 16:41:16 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011 2019 Nov 20 16:41:16 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011 2019 Nov 20 16:41:17 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011 2019 Nov 20 16:41:17 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011 2019 Nov 20 16:41:22 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011 2019 Nov 20 16:41:22 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011 2019 Nov 20 16:41:22 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011 2019 Nov 20 16:41:22 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011 2019 Nov 20 16:41:30 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011 2019 Nov 20 16:41:30 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011 2019 Nov 20 16:41:32 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011 2019 Nov 20 16:41:31 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011 2019 Nov 20 16:41:32 op-open-done 10.210.16.12 10.210.6.6 took: 0 ret: 10011
static struct nfs4_state *nfs4_do_open(struct inode *dir,
struct nfs_open_context *ctx,
int flags,
struct iattr *sattr,
struct nfs4_label *label,
int *opened)
{
struct nfs_server *server = NFS_SERVER(dir);
struct nfs4_exception exception = { };
struct nfs4_state *res;
int status;
do {
status = _nfs4_do_open(dir, ctx, flags, sattr, label, opened);
res = ctx->state;
trace_nfs4_open_file(ctx, flags, status);
if (status == 0)
break;
…
return res;
}
static int _nfs4_do_open(struct inode *dir,
struct nfs_open_context *ctx,
int flags,
struct iattr *sattr,
struct nfs4_label *label,
int *opened)
{
…
status = nfs4_recover_expired_lease(server);
if (status != 0)
goto err_put_state_owner;
…
err_put_state_owner:
nfs4_put_state_owner(sp);
out_err:
return status;
}
static int nfs4_recover_expired_lease(struct nfs_server *server)
{
return nfs4_client_recover_expired_lease(server->nfs_client);
}
nfs4state.c:
int nfs4_client_recover_expired_lease(struct nfs_client *clp)
{
unsigned int loop;
int ret;
for (loop = NFS4_MAX_LOOP_ON_RECOVER; loop != 0; loop--) {
ret = nfs4_wait_clnt_recover(clp);
if (ret != 0)
break;
if (!test_bit(NFS4CLNT_LEASE_EXPIRED, &clp->cl_state) &&
!test_bit(NFS4CLNT_CHECK_LEASE,&clp->cl_state))
break;
nfs4_schedule_state_manager(clp);
ret = -EIO;
}
return ret;
}
$ grep -r NFS4_MAX_LOOP_ON_RECOVER . | grep define ./nfs4_fs.h:#define NFS4_MAX_LOOP_ON_RECOVER (10)
nfs4renewd.c:
nfs4_renew_state(struct work_struct *work)
{
const struct nfs4_state_maintenance_ops *ops;
struct nfs_client *clp =
container_of(work, struct nfs_client, cl_renewd.work);
struct rpc_cred *cred;
long lease;
unsigned long last, now;
unsigned renew_flags = 0;
ops = clp->cl_mvops->state_renewal_ops;
dprintk("%s: start\n", __func__);
if (test_bit(NFS_CS_STOP_RENEW, &clp->cl_res_state))
goto out;
spin_lock(&clp->cl_lock);
lease = clp->cl_lease_time;
last = clp->cl_last_renewal;
now = jiffies;
/* Are we close to a lease timeout? */
if (time_after(now, last + lease/3))
renew_flags |= NFS4_RENEW_TIMEOUT;
if (nfs_delegations_present(clp))
renew_flags |= NFS4_RENEW_DELEGATION_CB;
if (renew_flags != 0) {
cred = ops->get_state_renewal_cred_locked(clp);
spin_unlock(&clp->cl_lock);
if (cred == NULL) {
if (!(renew_flags & NFS4_RENEW_DELEGATION_CB)) {
set_bit(NFS4CLNT_LEASE_EXPIRED, &clp->cl_state);
goto out;
}
nfs_expire_all_delegations(clp);
} else {
int ret;
/* Queue an asynchronous RENEW. */
ret = ops->sched_state_renewal(clp, cred, renew_flags);
put_rpccred(cred);
switch (ret) {
default:
goto out_exp;
case -EAGAIN:
case -ENOMEM:
break;
}
}
} else {
dprintk("%s: failed to call renewd. Reason: lease not expired \n",
__func__);
spin_unlock(&clp->cl_lock);
}
nfs4_schedule_state_renewal(clp);
out_exp:
nfs_expire_unreferenced_delegations(clp);
out:
dprintk("%s: done\n", __func__);
}
void
nfs4_schedule_state_renewal(struct nfs_client *clp)
{
long timeout;
spin_lock(&clp->cl_lock);
timeout = (2 * clp->cl_lease_time) / 3 + (long)clp->cl_last_renewal
- (long)jiffies;
if (timeout < 5 * HZ)
timeout = 5 * HZ;
dprintk("%s: requeueing work. Lease period = %ld\n",
__func__, (timeout + HZ - 1) / HZ);
mod_delayed_work(system_wq, &clp->cl_renewd, timeout);
set_bit(NFS_CS_RENEWD, &clp->cl_res_state);
spin_unlock(&clp->cl_lock);
}
/**
* nfs4_set_lease_period - Sets the lease period on a nfs_client
*
* @clp: pointer to nfs_client
* @lease: new value for lease period
* @lastrenewed: time at which lease was last renewed
*/
void nfs4_set_lease_period(struct nfs_client *clp,
unsigned long lease,
unsigned long lastrenewed)
{
spin_lock(&clp->cl_lock);
clp->cl_lease_time = lease;
clp->cl_last_renewal = lastrenewed;
spin_unlock(&clp->cl_lock);
/* Cap maximum reconnect timeout at 1/2 lease period */
rpc_cap_max_reconnect_timeout(clp->cl_rpcclient, lease >> 1);
}
static int nfs4_do_fsinfo(struct nfs_server *server, struct nfs_fh *fhandle, struct nfs_fsinfo *fsinfo)
{
struct nfs4_exception exception = { };
unsigned long now = jiffies;
int err;
do {
err = _nfs4_do_fsinfo(server, fhandle, fsinfo);
trace_nfs4_fsinfo(server, fhandle, fsinfo->fattr, err);
if (err == 0) {
nfs4_set_lease_period(server->nfs_client,
fsinfo->lease_time * HZ,
now);
break;
}
err = nfs4_handle_exception(server, err, &exception);
} while (exception.retry);
return err;
}
static int _nfs4_do_fsinfo(struct nfs_server *server, struct nfs_fh *fhandle,
struct nfs_fsinfo *fsinfo)
{
struct nfs4_fsinfo_arg args = {
.fh = fhandle,
.bitmask = server->attr_bitmask,
};
struct nfs4_fsinfo_res res = {
.fsinfo = fsinfo,
};
struct rpc_message msg = {
.rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_FSINFO],
.rpc_argp = &args,
.rpc_resp = &res,
};
return nfs4_call_sync(server->client, server, &msg, &args.seq_args, &res.seq_res, 0);
}
static void do_renew_lease(struct nfs_client *clp, unsigned long timestamp)
{
spin_lock(&clp->cl_lock);
if (time_before(clp->cl_last_renewal,timestamp))
clp->cl_last_renewal = timestamp;
spin_unlock(&clp->cl_lock);
}
static void renew_lease(const struct nfs_server *server, unsigned long timestamp)
{
struct nfs_client *clp = server->nfs_client;
if (!nfs4_has_session(clp))
do_renew_lease(clp, timestamp);
}
}
static void nfs4_open_confirm_done(struct rpc_task *task, void *calldata)
{
struct nfs4_opendata *data = calldata;
nfs40_sequence_done(task, &data->c_res.seq_res);
data->rpc_status = task->tk_status;
if (data->rpc_status == 0) {
nfs4_stateid_copy(&data->o_res.stateid, &data->c_res.stateid);
nfs_confirm_seqid(&data->owner->so_seqid, 0);
renew_lease(data->o_res.server, data->timestamp);
data->rpc_done = 1;
}
}
static void nfs4_open_done(struct rpc_task *task, void *calldata)
{
struct nfs4_opendata *data = calldata;
data->rpc_status = task->tk_status;
if (!nfs4_sequence_process(task, &data->o_res.seq_res))
return;
if (task->tk_status == 0) {
if (data->o_res.f_attr->valid & NFS_ATTR_FATTR_TYPE) {
switch (data->o_res.f_attr->mode & S_IFMT) {
case S_IFREG:
break;
case S_IFLNK:
data->rpc_status = -ELOOP;
break;
case S_IFDIR:
data->rpc_status = -EISDIR;
break;
default:
data->rpc_status = -ENOTDIR;
}
}
renew_lease(data->o_res.server, data->timestamp);
if (!(data->o_res.rflags & NFS4_OPEN_RESULT_CONFIRM))
nfs_confirm_seqid(&data->owner->so_seqid, 0);
}
data->rpc_done = 1;
}
# cat ./nfs4_do_fsinfo.stp
#!/usr/bin/env stap
/*
module("nfsv4").function("nfs4_do_fsinfo@fs/nfs/nfs4proc.c:4504") $server:struct nfs_server* $fhandle:struct nfs_fh* $fsinfo:struct nfs_fsinfo* $exception:struct nfs4_exception
*/
probe module("nfsv4").function("nfs4_do_fsinfo") {
printf("%s %s proc: %s[%d] uid: %d\n", ctime(gettimeofday_s()), probefunc(), execname(), pid(), uid())
print_backtrace()
exit()
}
# ./nfs4_do_fsinfo.stp WARNING: Missing unwind data for a module, rerun with 'stap -d nfs' Thu Nov 21 16:39:53 2019 nfs4_do_fsinfo proc: ls[18701] uid: 10006 0xffffffffc05de260 : nfs4_do_fsinfo+0x0/0x120 [nfsv4] 0xffffffffc05de3b1 : nfs4_proc_fsinfo+0x31/0x60 [nfsv4] 0xffffffffc05924e0 [nfs] 0xffffffffc0592b43 [nfs] 0xffffffffc059f098 [nfs] 0xffffffff9104d1fe 0xffffffff9106b377 0xffffffff9106b454 0xffffffffc05ab014 [nfs] 0xffffffffc05fb740 : nfs4_submount+0x90/0x830 [nfsv4] 0xffffffffc05ab21a [nfs] 0xffffffff91053a69 0xffffffff910549ad 0xffffffff91058483 0xffffffff9105878b 0xffffffff91058cea 0xffffffff9105954b 0xffffffff9105b497 0xffffffff9105b501 0xffffffff9104e1e3 # ./nfs4_do_fsinfo.stp WARNING: Missing unwind data for a module, rerun with 'stap -d nfs' Thu Nov 21 17:10:05 2019 nfs4_do_fsinfo proc: ls[29463] uid: 10006 0xffffffffc05de260 : nfs4_do_fsinfo+0x0/0x120 [nfsv4] 0xffffffffc05de3b1 : nfs4_proc_fsinfo+0x31/0x60 [nfsv4] 0xffffffffc05924e0 [nfs] 0xffffffffc0592b43 [nfs] 0xffffffffc059f098 [nfs] 0xffffffff9104d1fe 0xffffffff9106b377 0xffffffff9106b454 0xffffffffc05ab014 [nfs] 0xffffffffc05fb740 : nfs4_submount+0x90/0x830 [nfsv4] 0xffffffffc05ab21a [nfs] 0xffffffff91053a69 0xffffffff910549ad 0xffffffff91058483 0xffffffff9105878b 0xffffffff91058cea 0xffffffff9105954b 0xffffffff9105b497 0xffffffff9105b501 0xffffffff9104e1e3
# cat ./nfs4_do_fsinfo.stp
#!/usr/bin/env stap
/*
module("nfsv4").function("nfs4_do_fsinfo@fs/nfs/nfs4proc.c:4504") $server:struct nfs_server* $fhandle:struct nfs_fh* $fsinfo:struct nfs_fsinfo* $exception:struct nfs4_exception
*/
probe module("nfsv4").function("nfs4_do_fsinfo") {
printf("%s %s proc: %s[%d] ppid: %d args: %s cwd: %s uid: %d\n", ctime(gettimeofday_s()), probefunc(), execname(), pid(),
ppid(), cmdline_str(), fullpath_struct_path(task_cwd_path(task_current())), uid())
print_backtrace()
exit()
}
# ./nfs4_do_fsinfo.stp WARNING: Missing unwind data for a module, rerun with 'stap -d nfs' Thu Nov 21 17:40:05 2019 nfs4_do_fsinfo proc: ls[9275] ppid: 9273 args: ls /mnt/.../ cwd: /... uid: 10006 0xffffffffc05de260 : nfs4_do_fsinfo+0x0/0x120 [nfsv4] 0xffffffffc05de3b1 : nfs4_proc_fsinfo+0x31/0x60 [nfsv4] 0xffffffffc05924e0 [nfs] 0xffffffffc0592b43 [nfs] 0xffffffffc059f098 [nfs] 0xffffffff9104d1fe 0xffffffff9106b377 0xffffffff9106b454 0xffffffffc05ab014 [nfs] 0xffffffffc05fb740 : nfs4_submount+0x90/0x830 [nfsv4] 0xffffffffc05ab21a [nfs] 0xffffffff91053a69 0xffffffff910549ad 0xffffffff91058483 0xffffffff9105878b 0xffffffff91058cea 0xffffffff9105954b 0xffffffff9105b497 0xffffffff9105b501 0xffffffff9104e1e3
/*
* FSINFO request
*/
static void nfs4_xdr_enc_fsinfo(struct rpc_rqst *req, struct xdr_stream *xdr,
struct nfs4_fsinfo_arg *args)
{
struct compound_hdr hdr = {
.minorversion = nfs4_xdr_minorversion(&args->seq_args),
};
encode_compound_hdr(xdr, req, &hdr);
encode_sequence(xdr, &args->seq_args, &hdr);
encode_putfh(xdr, args->fh, &hdr);
encode_fsinfo(xdr, args->bitmask, &hdr);
encode_nops(&hdr);
}
static void encode_fsinfo(struct xdr_stream *xdr, const u32* bitmask, struct compound_hdr *hdr)
{
encode_getattr_three(xdr,
bitmask[0] & nfs4_fsinfo_bitmap[0],
bitmask[1] & nfs4_fsinfo_bitmap[1],
bitmask[2] & nfs4_fsinfo_bitmap[2],
hdr);
}
# umount /mnt/... $ ./sh/checkXXXMounts.sh Checking /mnt/... ...
# stap -g -d/bin/ls --ldd --all-modules ./nfs4_do_fsinfo.stp WARNING: missing unwind/symbol data for module 'stap_6d2b184a53eae72ccbc95adec92f2d3c_25253' Fri Nov 22 10:32:32 2019 nfs4_do_fsinfo proc: ls[27494] ppid: 27492 args: ls /mnt/... cwd: /... uid: 10006 0xffffffffc05de260 : nfs4_do_fsinfo+0x0/0x120 [nfsv4] 0xffffffffc05de3b1 : nfs4_proc_fsinfo+0x31/0x60 [nfsv4] 0xffffffffc05924e0 : nfs_probe_fsinfo+0xc0/0x660 [nfs] 0xffffffffc0592b43 : nfs_clone_server+0xc3/0x1f0 [nfs] 0xffffffffc059f098 : nfs_xdev_mount+0x98/0x130 [nfs] 0xffffffff9104d1fe : mount_fs+0x3e/0x1b0 [kernel] 0xffffffff9106b377 : vfs_kern_mount+0x67/0x110 [kernel] 0xffffffff9106b454 : vfs_submount+0x34/0x40 [kernel] 0xffffffffc05ab014 : nfs_do_submount+0xb4/0x160 [nfs] 0xffffffffc05fb740 : nfs4_submount+0x90/0x830 [nfsv4] 0xffffffffc05ab21a : nfs_d_automount+0xca/0x1c0 [nfs] 0xffffffff91053a69 : follow_managed+0x149/0x310 [kernel] 0xffffffff910549ad : lookup_fast+0x12d/0x230 [kernel] 0xffffffff91058483 : link_path_walk+0x1b3/0x8b0 [kernel] 0xffffffff9105878b : link_path_walk+0x4bb/0x8b0 [kernel] 0xffffffff91058cea : path_lookupat+0x7a/0x8b0 [kernel] 0xffffffff9105954b : filename_lookup+0x2b/0xc0 [kernel] 0xffffffff9105b497 : user_path_at_empty+0x67/0xc0 [kernel] 0xffffffff9105b501 : user_path_at+0x11/0x20 [kernel] 0xffffffff9104e1e3 : vfs_fstatat+0x63/0xc0 [kernel] ------- 0x7fd54c693275 : __xstat64+0x15/0x50 [/usr/lib64/libc-2.17.so] 0x408841 : gobble_file.constprop.50+0x8b1/0xeb0 [/usr/bin/ls] 0x402e41 : main+0x531/0x1a14 [/usr/bin/ls] 0x7fd54c5c6505 : __libc_start_main+0xf5/0x1c0 [/usr/lib64/libc-2.17.so] 0x40434d : _start+0x29/0x2c [/usr/bin/ls] -------
# umount /mnt/... # ls -l /mnt/... >/dev/null # stap -g -d/bin/ls --ldd --all-modules ./nfs4_do_fsinfo.stp Fri Nov 22 10:37:17 2019 nfs4_do_fsinfo proc: ls[29995] ppid: 13019 args: ls --color=auto -l /mnt/... cwd: /... uid: 0 0xffffffffc05de260 : nfs4_do_fsinfo+0x0/0x120 [nfsv4] 0xffffffffc05de3b1 : nfs4_proc_fsinfo+0x31/0x60 [nfsv4] 0xffffffffc05924e0 : nfs_probe_fsinfo+0xc0/0x660 [nfs] 0xffffffffc0592b43 : nfs_clone_server+0xc3/0x1f0 [nfs] 0xffffffffc059f098 : nfs_xdev_mount+0x98/0x130 [nfs] 0xffffffff9104d1fe : mount_fs+0x3e/0x1b0 [kernel] 0xffffffff9106b377 : vfs_kern_mount+0x67/0x110 [kernel] 0xffffffff9106b454 : vfs_submount+0x34/0x40 [kernel] 0xffffffffc05ab014 : nfs_do_submount+0xb4/0x160 [nfs] 0xffffffffc05fb740 : nfs4_submount+0x90/0x830 [nfsv4] 0xffffffffc05ab21a : nfs_d_automount+0xca/0x1c0 [nfs] 0xffffffff91053a69 : follow_managed+0x149/0x310 [kernel] 0xffffffff910549ad : lookup_fast+0x12d/0x230 [kernel] 0xffffffff91058ddd : path_lookupat+0x16d/0x8b0 [kernel] 0xffffffff9105954b : filename_lookup+0x2b/0xc0 [kernel] 0xffffffff9105b497 : user_path_at_empty+0x67/0xc0 [kernel] 0xffffffff9105b501 : user_path_at+0x11/0x20 [kernel] 0xffffffff9104e1e3 : vfs_fstatat+0x63/0xc0 [kernel] 0xffffffff9104e601 : SYSC_newlstat+0x31/0x60 [kernel]
# bc 60*60*24*7 604800 # echo 604800 > /proc/sys/fs/nfs/nfs_mountpoint_timeout # date Fri 22 Nov 12:23:27 GMT 2019
# stap -g --ldd --all-modules --suppress-handler-errors -o /var/tmp/nfs4_do_fsinfo.log ./nfs4_do_fsinfo.stp & [6] 10295
# cat ./nfs4_renew_state.stp
#!/bin/stap
/*
* module("nfsv4").function("nfs4_renew_state@fs/nfs/nfs4renewd.c:55") $work:struct work_struct* $clp:struct nfs_client* $renew_flags:unsigned int $__func__:char[] const
*/
probe module("nfsv4").function("nfs4_renew_state") {
printf("%s %s least_time: %d secs_to_renew: %d server: %s\n", ctime(gettimeofday_s()), probefunc(),
$clp->cl_lease_time/HZ(), (jiffies()-$clp->cl_last_renewal)/HZ(), kernel_string($clp->cl_hostname))
}
# ./nfs4_renew_state.stp | grep xxx Fri Nov 22 15:40:41 2019 nfs4_renew_state least_time: 90 secs_to_renew: 60 server: xxx Fri Nov 22 15:41:42 2019 nfs4_renew_state least_time: 90 secs_to_renew: 5 server: xxx Fri Nov 22 15:42:36 2019 nfs4_renew_state least_time: 90 secs_to_renew: 60 server: xxx
# umount /mnt/...; ls -l /mnt/.../ ls: cannot open directory /mnt/.../: Permission denied
$ date;strace -e trace=open cat /mnt/.../xxx >/dev/null
Fri 22 Nov 15:42:13 GMT 2019
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
open("/mnt/.../xxx", O_RDONLY) = -1 EIO (Input/output error)
...
$ date;strace -e trace=open cat /mnt/.../xxx >/dev/null
Fri 22 Nov 15:42:38 GMT 2019
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
open("/mnt/.../xxx", O_RDONLY) = 3
+++ exited with 0 +++
# ggrep "Nov 22 .*-done.*10.210.16.12" /var/tmp/nfs4-renew.log | grep "ret: 10011" | tail ... 2019 Nov 22 15:42:36 op-renew-done noi_xid: 2258385869 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds 2019 Nov 22 15:42:36 op-renew-done noi_xid: 2275163085 10.210.16.12 10.210.6.5 60001:60001 <unknown> ret: 10011 took: 0 seconds # grep 15:42:13 /var/tmp/nfs4_all_ops.log | grep 10011 2019 Nov 22 15:42:13 op-open-done 10.210.16.12 10.210.6.5 took: 0 ret: 10011
# cat ./nfs4_renew_state.stp
#!/bin/stap
/*
* module("nfsv4").function("nfs4_renew_state@fs/nfs/nfs4renewd.c:55") $work:struct work_struct* $clp:struct nfs_client* $renew_flags:unsigned int $__func__:char[] const
*/
probe module("nfsv4").function("nfs4_renew_state") {
printf("%s %s least_time: %d secs_to_renew: %d server: %s\n", ctime(gettimeofday_s()), probefunc(),
$clp->cl_lease_time/HZ(), (jiffies()-$clp->cl_last_renewal)/HZ(), kernel_string($clp->cl_hostname))
}
# ./nfs4_renew_state.stp | grep xxx Mon Nov 25 13:49:36 2019 nfs4_renew_state least_time: 90 secs_to_renew: 40 server: xxx Mon Nov 25 13:50:36 2019 nfs4_renew_state least_time: 90 secs_to_renew: 60 server: xxx Mon Nov 25 13:51:36 2019 nfs4_renew_state least_time: 90 secs_to_renew: 3 server: xxx Mon Nov 25 13:52:33 2019 nfs4_renew_state least_time: 90 secs_to_renew: 60 server: xxx Mon Nov 25 13:52:38 2019 nfs4_renew_state least_time: 90 secs_to_renew: 65 server: xxx Mon Nov 25 13:53:38 2019 nfs4_renew_state least_time: 90 secs_to_renew: 60 server: xxx ^C
# date; umount /mnt/...; ls -l /mnt/.../ Mon 25 Nov 13:51:32 GMT 2019 total 7 -rw-r--r-- 1 root root 5 Nov 25 13:48 test_f1
$ date;strace -e trace=open cat /mnt/.../test_f1 >/dev/null
Mon 25 Nov 13:52:08 GMT 2019
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
open("/mnt/.../test_f1", O_RDONLY) = -1 EIO (Input/output error)
...
$ date;strace -e trace=open cat /mnt/.../test_f1 >/dev/null
Mon 25 Nov 13:53:39 GMT 2019
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
open("/mnt/.../test_f1", O_RDONLY) = 3
+++ exited with 0 +++
Hi Robert, I have gone through this case and your request for explanation on why ONTAP is not throwing “NFS4ERR_EXPIRED” when the lease is expired while other servers does that. Why the Client did not send the RENEW call is client’s behavior and ONTAP cannot control that. Now on why ONTAP is not throwing “NFS4ERR_EXPIRED” even after lease is expired (30 secs), I believe the RFC has the answer for that. This section explains the behavior of other servers you have tested. https://tools.ietf.org/html/rfc7530#section-9.6.3 9.6.3. Network Partitions and Recovery If this occurs, the server may cancel the lease and free all locks held for the client. The following section explains the ONTAP’s behavior. https://tools.ietf.org/html/rfc7530#section-9.6.3.1 9.6.3.1. Courtesy Locks As a courtesy to the client or as an optimization, the server may continue to hold locks, including delegations, on behalf of a client for which recent communication has extended beyond the lease period, delaying the cancellation of the lease. This section clarifies that both the behaviors can be implemented. https://tools.ietf.org/html/rfc7530#section-9.6.3.2 9.6.3.2. Lease Cancellation As a result of lease expiration, leases may be canceled, either immediately upon expiration or subsequently, depending on the occurrence of a conflicting lock or extension of the period of partition beyond what the server will tolerate. I believe ONTAP is behaving according to the RFC and in more optimized way. In that case, please let me know if you need any further explanation from our side on the same. ...
Hey Robert, We tested in the lab thoroughly to check for how long we keep the courtesy locks. The courtesy locks will be kept for 2*lease time, if there is no conflicting request from any other client for the same file. Please note that it might take some more time (40-50 secs) to clear those states from memory as the thread runs periodically to clean that. Yes, they are tunable. As the time depends on lease-time, you can tweak that for a particular NFS server to control the courtesy lock time. ...
# while [ 1 ] ; do date; umount /mnt/...; ls -l /mnt/.../; sleep 10; done
$ date; strace -e trace=open cat /mnt/.../f1
Fri 6 Dec 10:52:53 GMT 2019
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
open("/mnt/.../f1", O_RDONLY) = 3
test
+++ exited with 0 +++
34 3.656314120 10.210.16.12 -> 10.210.196.134 NFS 322 V4 Call OPEN DH: 0x6d8f1235/f1 35 3.861180170 10.210.196.134 -> 10.210.16.12 NFS 150 V4 Reply (Call In 34) OPEN Status: NFS4ERR_STALE_CLIENTID 36 3.861352838 10.210.16.12 -> 10.210.196.134 NFS 230 V4 Call RENEW CID: 0x5b6f 37 4.066210651 10.210.196.134 -> 10.210.16.12 NFS 182 V4 Reply (Call In 36) RENEW Status: NFS4ERR_STALE_CLIENTID 38 4.066357237 10.210.16.12 -> 10.210.196.134 NFS 338 V4 Call SETCLIENTID 39 4.271134903 10.210.196.134 -> 10.210.16.12 NFS 198 V4 Reply (Call In 38) SETCLIENTID 40 4.271256568 10.210.16.12 -> 10.210.196.134 NFS 238 V4 Call SETCLIENTID_CONFIRM 41 4.476085500 10.210.196.134 -> 10.210.16.12 NFS 182 V4 Reply (Call In 40) SETCLIENTID_CONFIRM 44 4.476296996 10.210.16.12 -> 10.210.196.134 NFS 322 V4 Call OPEN DH: 0x6d8f1235/f1 49 4.681334155 10.210.196.134 -> 10.210.16.12 NFS 470 V4 Reply (Call In 44) OPEN StateID: 0x900a 50 4.683485544 10.210.16.12 -> 10.210.196.134 NFS 258 V4 Call READ StateID: 0x900a Offset: 0 Len: 5 53 4.888349784 10.210.196.134 -> 10.210.16.12 NFS 166 V4 Reply (Call In 50) READ 54 4.888516437 10.210.16.12 -> 10.210.196.134 NFS 266 V4 Call CLOSE StateID: 0x900a 56 5.093391513 10.210.196.134 -> 10.210.16.12 NFS 230 V4 Reply (Call In 54) CLOSE
# cat ./nfs4_do_handle_exception.stp
#!/bin/stap
// /bin/stap -g -d /bin/ls --ldd --all-modules
/*
* module("nfsv4").function("nfs4_do_handle_exception@fs/nfs/nfs4proc.c:383") $server:struct nfs_server* $errorcode:int $exception:struct nfs4_exception* $__func__:char[] const
*/
probe module("nfsv4").function("nfs4_do_handle_exception") {
printf("%s %s proc: %s[%d] ppid: %d args: %s cwd: %s uid: %d\n", ctime(gettimeofday_s()), probefunc(), execname(),
pid(), ppid(), cmdline_str(), fullpath_struct_path(task_cwd_path(task_current())), uid())
print_backtrace()
printf("-------\n")
//print_ubacktrace()
//printf("-------\n")
}
# ./nfs4_do_handle_exception.stp --all-modules -o /var/tmp/nfs4_do_handle_exception.out &
$ date; strace -e trace=open cat /mnt/.../f1
Fri 6 Dec 11:28:32 GMT 2019
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
open("/mnt/.../f1", O_RDONLY) = 3
test
+++ exited with 0 +++
24 11:28:32.419449800 10.210.16.12 -> 10.210.196.134 NFS 250 V4 Call ACCESS FH: 0x49b41796, [Check: RD LU MD XT DL] 25 11:28:32.624732095 10.210.196.134 -> 10.210.16.12 NFS 222 V4 Reply (Call In 24) ACCESS, [Access Denied: MD XT DL], [Allowed: RD LU] 27 11:28:32.625066507 10.210.16.12 -> 10.210.196.134 NFS 250 V4 Call ACCESS FH: 0x951a2495, [Check: RD LU MD XT DL] 28 11:28:32.830312285 10.210.196.134 -> 10.210.16.12 NFS 222 V4 Reply (Call In 27) ACCESS, [Allowed: RD LU MD XT DL] 29 11:28:32.830765027 10.210.16.12 -> 10.210.196.134 NFS 250 V4 Call ACCESS FH: 0x6d8f1235, [Check: RD LU MD XT DL] 30 11:28:33.046220915 10.210.196.134 -> 10.210.16.12 NFS 222 V4 Reply (Call In 29) ACCESS, [Access Denied: MD XT DL], [Allowed: RD LU] 31 11:28:33.046554768 10.210.16.12 -> 10.210.196.134 NFS 322 V4 Call OPEN DH: 0x6d8f1235/f1 32 11:28:33.251846761 10.210.196.134 -> 10.210.16.12 NFS 150 V4 Reply (Call In 31) OPEN Status: NFS4ERR_STALE_CLIENTID 33 11:28:33.252467673 10.210.16.12 -> 10.210.196.134 NFS 230 V4 Call RENEW CID: 0x3e59 34 11:28:33.457711654 10.210.196.134 -> 10.210.16.12 NFS 182 V4 Reply (Call In 33) RENEW Status: NFS4ERR_STALE_CLIENTID 35 11:28:33.457873262 10.210.16.12 -> 10.210.196.134 NFS 338 V4 Call SETCLIENTID 36 11:28:33.663124308 10.210.196.134 -> 10.210.16.12 NFS 198 V4 Reply (Call In 35) SETCLIENTID 37 11:28:33.663262733 10.210.16.12 -> 10.210.196.134 NFS 238 V4 Call SETCLIENTID_CONFIRM 38 11:28:33.868584648 10.210.196.134 -> 10.210.16.12 NFS 182 V4 Reply (Call In 37) SETCLIENTID_CONFIRM 41 11:28:33.868821676 10.210.16.12 -> 10.210.196.134 NFS 322 V4 Call OPEN DH: 0x6d8f1235/f1 46 11:28:34.074306487 10.210.196.134 -> 10.210.16.12 NFS 470 V4 Reply (Call In 41) OPEN StateID: 0x985c 47 11:28:34.074508377 10.210.16.12 -> 10.210.196.134 NFS 258 V4 Call READ StateID: 0x985c Offset: 0 Len: 5 50 11:28:34.279740081 10.210.196.134 -> 10.210.16.12 NFS 166 V4 Reply (Call In 47) READ 51 11:28:34.279909666 10.210.16.12 -> 10.210.196.134 NFS 266 V4 Call CLOSE StateID: 0x985c 53 11:28:34.485207915 10.210.196.134 -> 10.210.16.12 NFS 230 V4 Reply (Call In 51) CLOSE
# less /var/tmp/nfs4_do_handle_exception.out ... ------- Fri Dec 6 11:28:32 2019 nfs4_do_handle_exception proc: cat[5231] ppid: 5229 args: cat /mnt/.../f1 cwd: /...w uid: 14100 0xffffffffc05d5a80 : nfs4_do_handle_exception+0x0/0x340 [nfsv4] 0xffffffffc05dcf0c : nfs4_handle_exception+0x1c/0x160 [nfsv4] 0xffffffffc05df973 : nfs4_proc_access+0x83/0xf0 [nfsv4] 0xffffffffc0597adb : nfs_do_access+0x1bb/0x3b0 [nfs] 0xffffffffc0597ebe : nfs_permission+0x1be/0x220 [nfs] 0xffffffff91054471 : __inode_permission+0x71/0xd0 [kernel] 0xffffffff910544e8 : inode_permission+0x18/0x50 [kernel] 0xffffffff9105854e : link_path_walk+0x27e/0x8b0 [kernel] 0xffffffff91059c3e : path_openat+0xae/0x5a0 [kernel] 0xffffffff9105b5cd : do_filp_open+0x4d/0xb0 [kernel] 0xffffffff91047524 : do_sys_open+0x124/0x220 [kernel] 0xffffffff9104763e : sys_open+0x1e/0x20 [kernel] 0xffffffff9158d172 : tracesys+0xa6/0xcc [kernel] 0x7f6b0d228710 ------- Fri Dec 6 11:28:32 2019 nfs4_do_handle_exception proc: cat[5231] ppid: 5229 args: cat /mnt/.../f1 cwd: /... uid: 14100 0xffffffffc05d5a80 : nfs4_do_handle_exception+0x0/0x340 [nfsv4] 0xffffffffc05dcf0c : nfs4_handle_exception+0x1c/0x160 [nfsv4] 0xffffffffc05df973 : nfs4_proc_access+0x83/0xf0 [nfsv4] 0xffffffffc0597adb : nfs_do_access+0x1bb/0x3b0 [nfs] 0xffffffffc0597ebe : nfs_permission+0x1be/0x220 [nfs] 0xffffffff91054471 : __inode_permission+0x71/0xd0 [kernel] 0xffffffff910544e8 : inode_permission+0x18/0x50 [kernel] 0xffffffff9105854e : link_path_walk+0x27e/0x8b0 [kernel] 0xffffffff91059c3e : path_openat+0xae/0x5a0 [kernel] 0xffffffff9105b5cd : do_filp_open+0x4d/0xb0 [kernel] 0xffffffff91047524 : do_sys_open+0x124/0x220 [kernel] 0xffffffff9104763e : sys_open+0x1e/0x20 [kernel] 0xffffffff9158d172 : tracesys+0xa6/0xcc [kernel] 0x7f6b0d228710 ------- Fri Dec 6 11:28:33 2019 nfs4_do_handle_exception proc: cat[5231] ppid: 5229 args: cat /mnt/.../f1 cwd: /... uid: 14100 0xffffffffc05d5a80 : nfs4_do_handle_exception+0x0/0x340 [nfsv4] 0xffffffffc05dcf0c : nfs4_handle_exception+0x1c/0x160 [nfsv4] 0xffffffffc05df973 : nfs4_proc_access+0x83/0xf0 [nfsv4] 0xffffffffc0597adb : nfs_do_access+0x1bb/0x3b0 [nfs] 0xffffffffc0597ebe : nfs_permission+0x1be/0x220 [nfs] 0xffffffff91054471 : __inode_permission+0x71/0xd0 [kernel] 0xffffffff910544e8 : inode_permission+0x18/0x50 [kernel] 0xffffffff9105854e : link_path_walk+0x27e/0x8b0 [kernel] 0xffffffff91059c3e : path_openat+0xae/0x5a0 [kernel] 0xffffffff9105b5cd : do_filp_open+0x4d/0xb0 [kernel] 0xffffffff91047524 : do_sys_open+0x124/0x220 [kernel] 0xffffffff9104763e : sys_open+0x1e/0x20 [kernel] 0xffffffff9158d172 : tracesys+0xa6/0xcc [kernel] 0x7f6b0d228710 ------- Fri Dec 6 11:28:33 2019 nfs4_do_handle_exception proc: cat[5231] ppid: 5229 args: cat /mnt/.../f1 cwd: /... uid: 14100 0xffffffffc05d5a80 : nfs4_do_handle_exception+0x0/0x340 [nfsv4] 0xffffffffc05dcf0c : nfs4_handle_exception+0x1c/0x160 [nfsv4] 0xffffffffc05e0296 : nfs4_do_open+0x2c6/0x630 [nfsv4] 0xffffffffc05e06f7 : nfs4_atomic_open+0xf7/0x110 [nfsv4] 0xffffffffc0597317 : nfs_atomic_open+0x197/0x3a0 [nfs] 0xffffffff91057623 : do_last+0xa53/0x1290 [kernel] 0xffffffff91059c5d : path_openat+0xcd/0x5a0 [kernel] 0xffffffff9105b5cd : do_filp_open+0x4d/0xb0 [kernel] 0xffffffff91047524 : do_sys_open+0x124/0x220 [kernel] 0xffffffff9104763e : sys_open+0x1e/0x20 [kernel] 0xffffffff9158d172 : tracesys+0xa6/0xcc [kernel] 0x7f6b0d228710 -------
static int nfs4_do_handle_exception(struct nfs_server *server,
int errorcode, struct nfs4_exception *exception)
{
...
case -NFS4ERR_EXPIRED:
case -NFS4ERR_BAD_STATEID:
if (inode != NULL && stateid != NULL) {
nfs_inode_find_state_and_recover(inode,
stateid);
goto wait_on_recovery;
}
...
case -NFS4ERR_STALE_STATEID:
case -NFS4ERR_STALE_CLIENTID:
nfs4_schedule_lease_recovery(clp);
goto wait_on_recovery;
...
@@ -5960,6 +5962,8 @@ static void nfs4_release_lockowner_done(struct rpc_task *task, void *calldata)
break;
case -NFS4ERR_STALE_CLIENTID:
case -NFS4ERR_EXPIRED:
+ nfs4_schedule_lease_recovery(server->nfs_client);
+ break;
commit 272289a3df7297fd798a8ddf652c6d29f675de28
Author: Trond Myklebust
Date: Thu Sep 22 13:39:15 2016 -0400
NFSv4: nfs4_do_handle_exception() handle revoke/expiry of a single stateid
If we're not yet sure that all state has expired or been revoked, we
should try to do a minimal recovery on just the one stateid.
Signed-off-by: Trond Myklebust
Tested-by: Oleg Drokin
Signed-off-by: Anna Schumaker
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 389089e..409b9fa 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -397,13 +397,23 @@ static int nfs4_do_handle_exception(struct nfs_server *server,
exception->delay = 0;
exception->recovering = 0;
exception->retry = 0;
+
+ if (stateid == NULL && state != NULL)
+ stateid = &state->stateid;
NFSv4: try lease recovery on NFS4ERR_EXPIRED
Currently, if an nfs server returns NFS4ERR_EXPIRED to open(),
we return EIO to applications without even trying to recover.
Fixes: 272289a ("NFSv4: nfs4_do_handle_exception() handle revoke/expiry of a single stateid")
Signed-off-by: Robert Milkowski
Reviewed-by: Trond Myklebust
Signed-off-by: Anna Schumaker
NFSv4.0: nfs4_do_fsinfo() should not do implicit lease renewals
Currently, each time nfs4_do_fsinfo() is called it will do an implicit
NFS4 lease renewal, which is not compliant with the NFS4 specification.
This can result in a lease being expired by an NFS server.
Commit 83ca7f5 ("NFS: Avoid PUTROOTFH when managing leases")
introduced implicit client lease renewal in nfs4_do_fsinfo(),
which can result in the NFSv4.0 lease to expire on a server side,
and servers returning NFS4ERR_EXPIRED or NFS4ERR_STALE_CLIENTID.
This can easily be reproduced by frequently unmounting a sub-mount,
then stat'ing it to get it mounted again, which will delay or even
completely prevent client from sending RENEW operations if no other
NFS operations are issued. Eventually nfs server will expire client's
lease and return an error on file access or next RENEW.
This can also happen when a sub-mount is automatically unmounted
due to inactivity (after nfs_mountpoint_expiry_timeout), then it is
mounted again via stat(). This can result in a short window during
which client's lease will expire on a server but not on a client.
This specific case was observed on production systems.
This patch removes the implicit lease renewal from nfs4_do_fsinfo().
Fixes: 83ca7f5 ("NFS: Avoid PUTROOTFH when managing leases")
Signed-off-by: Robert Milkowski
Signed-off-by: Anna Schumaker