chhg
March 24, 2021, 9:28am
1
System: Jetson AGX Xavier, example ubuntu image from NVIDIA, Kernel 4.9.140, JetPack 4.4 (L4T R32.4.3).
Following minimalized code example reproduces it:
include <iostream>
include <unistd.h>
include <sys/types.h>
include <sys/wait.h>
using namespace std;
// Minimalized example out of QProcess (Qt 5.14.2)
void forker()
{
pid_t pid = fork();
if (pid == 0)
{
cout << “PID child process: " << getpid() << endl;
char* argv[]
= {”/bin/ls", “-l”, “/home/ “, NULL};
execv(”/bin/ls”, argv);
exit(0);
}
else if (pid >
0)
{
cout << "PID parent process: " << getpid() << endl;
wait(NULL);
}
else
{
cerr << “Error!”;
exit(1);
}
}
static void initValues(unsigned char result[10])
{
for (int i = 0; i <
10; i++) {
result[i] = 3;
}
}
int main(int, const char * const []
)
{
static unsigned char data1[10];
initValues(data1);
cudaFree(0);
unsigned char(*gpu_ptr_1)[10UL];
cudaMalloc(&gpu_ptr_1, 10UL);
cudaMemcpy(*gpu_ptr_1, data1, 10UL, cudaMemcpyHostToDevice);
cout << " after first malloc " << endl;
sleep(2);
forker();
sleep(2);
cout << " after forker " << endl;
static unsigned char data2[10];
initValues(data2);
unsigned char(*gpu_ptr_2)[10UL];
cudaMalloc(&gpu_ptr_2, 10UL);
cudaMemcpy(*gpu_ptr_2, data2, 10UL, cudaMemcpyHostToDevice);
cout << " after last malloc " << endl;
}
Hi,
Let us check your source first.
Will update more information with you later.
Thanks.
Hi,
Not sure if we miss something.
We slightly modify your source, and can run it on Xavier + JetPack4.5 without crashing.
test.cu (1.1 KB)
$ nvcc test.cu -o test
$ ./test
after first malloc
PID parent process: 9946
PID child process: 9948
/bin/ls: cannot access '/home/ ': No such file or directory
after forker
after last malloc
Thanks.
1 Like
chhg
March 25, 2021, 9:54am
5
Hi AastaLLL,
It’s fantastic to hear, that you reach 'after last malloc'
without having a segmentation fault before! Was it fixed in JetPack4.4.1 or 4.5 or is it maybe a kernel issue? On which kernel version did you test? I’m working in a yocto project on Xavier and not very agile in swapping JetPack versions. I would be glad to know, where the problem was solved.
Thank you very much!
Hi,
We test it on JetPack 4.5.
You can give it a try.
Thanks.
chhg
April 19, 2021, 2:34pm
8
Hi AastaLLL
I tested it on JetPack 4.5 (not 4.5.1) but sometimes it still crashed and sometimes not!?!? It seems to run better but even not stable. (I run the code in a loop and it failed after about 50 times, after sudo apt upgrade
after about 5 cycles or earlier)
Any idea?
Thanks for your help!
chhg
April 24, 2021, 9:24pm
9
By the way, the above code example is from Mathworks. I use the GPUCoder in Matlab and had the host segmentation fault in connection with fork / wait in the generated code. Mathworks refuses, saying it’s an NVIDIA issue.
chhg
April 29, 2021, 6:51am
10
Could you investigate the problem, please? I urgently need a solution!
Hi,
We can reproduce this issue internally.
It seems that this issue is related to the timing of processes.
If you maximize the device performance, the failure rate decrease.
$ sudo jetson_clocks
We are still checking the root cause. Will share more information with you later.
Thanks.
chhg
May 5, 2021, 9:17am
15
Hi AastaLLL,
Thanks for examining this issue! Are there any news about it?
I found out that right after a system reboot it works with no segmentation faults. The problem reappears after about 2 minutes.
Changing the clock made no difference. I’m running MAXN.
Many Thanks!
Hi,
We found that this issue is related to GPU rail gating.
Currently, please turn-off it as a temporal WAR:
$ echo 0 | sudo tee /sys/devices/gpu.0/railgate_enable
Our internally team is working on the rail gating fix.
Will let you know once we have further progress.
Thanks.
chhg
May 10, 2021, 2:01pm
17
Hi,
Great, this seems like a stable workaround! Is it possible to configure this railgate_enable so that it remains after a restart?
Thx!
Hi,
You can create a custom systemmd boot script.
Or use cron @reboot rule to disable the setting automatically on every boot.
Thanks.
Hi,
Here is an update for you.
We have fixed this bug internally and the fix will be part of the next release.
Thanks for reporting this bug to us.
chhg
July 26, 2021, 2:41pm
21
Hi AastaLLL,
I just tested R32.5.2 but without your workaround it still crashes!? In which release should it be fixed?
Thanks.
kayccc
August 4, 2021, 6:08am
22
Please wait for the next official release. Thanks
system
Closed
October 3, 2021, 6:09am
23
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.