Crash in cudaMemcpy after fork and wait for a child process

System: Jetson AGX Xavier, example ubuntu image from NVIDIA, Kernel 4.9.140, JetPack 4.4 (L4T R32.4.3).
Following minimalized code example reproduces it:

include <iostream>
include <unistd.h>
include <sys/types.h>
include <sys/wait.h>

using namespace std;

// Minimalized example out of QProcess (Qt 5.14.2)
void forker()
{
pid_t pid = fork();
if (pid == 0)
{
cout << “PID child process: " << getpid() << endl;
char* argv[] = {”/bin/ls", “-l”, “/home/ “, NULL};
execv(”/bin/ls”, argv);
exit(0);
}
else if (pid > 0)
{
cout << "PID parent process: " << getpid() << endl;
wait(NULL);
}
else
{
cerr << “Error!”;
exit(1);
}
}

static void initValues(unsigned char result[10])
{
for (int i = 0; i < 10; i++) {
result[i] = 3;
}
}

int main(int, const char * const [])
{
static unsigned char data1[10];
initValues(data1);

cudaFree(0);

unsigned char(*gpu_ptr_1)[10UL];
cudaMalloc(&gpu_ptr_1, 10UL);
cudaMemcpy(*gpu_ptr_1, data1, 10UL, cudaMemcpyHostToDevice);

cout << " after first malloc " << endl;

sleep(2);
forker();
sleep(2);

cout << " after forker " << endl;

static unsigned char data2[10];
initValues(data2);

unsigned char(*gpu_ptr_2)[10UL];
cudaMalloc(&gpu_ptr_2, 10UL);
cudaMemcpy(*gpu_ptr_2, data2, 10UL, cudaMemcpyHostToDevice);

cout << " after last malloc " << endl;
}

Hi,

Let us check your source first.
Will update more information with you later.

Thanks.

Hi,

Not sure if we miss something.
We slightly modify your source, and can run it on Xavier + JetPack4.5 without crashing.

test.cu (1.1 KB)

$ nvcc test.cu -o test
$ ./test

after first malloc
PID parent process: 9946
PID child process: 9948
/bin/ls: cannot access '/home/ ': No such file or directory
after forker
after last malloc

Thanks.

1 Like

Hi AastaLLL,

It’s fantastic to hear, that you reach 'after last malloc' without having a segmentation fault before! Was it fixed in JetPack4.4.1 or 4.5 or is it maybe a kernel issue? On which kernel version did you test? I’m working in a yocto project on Xavier and not very agile in swapping JetPack versions. I would be glad to know, where the problem was solved.

Thank you very much!

Hi,

We test it on JetPack 4.5.
You can give it a try.

Thanks.

Hi AastaLLL

I tested it on JetPack 4.5 (not 4.5.1) but sometimes it still crashed and sometimes not!?!? It seems to run better but even not stable. (I run the code in a loop and it failed after about 50 times, after sudo apt upgrade after about 5 cycles or earlier)

Any idea?

Thanks for your help!

By the way, the above code example is from Mathworks. I use the GPUCoder in Matlab and had the host segmentation fault in connection with fork / wait in the generated code. Mathworks refuses, saying it’s an NVIDIA issue.

Could you investigate the problem, please? I urgently need a solution!

Hi,

We can reproduce this issue internally.

It seems that this issue is related to the timing of processes.
If you maximize the device performance, the failure rate decrease.

$ sudo jetson_clocks

We are still checking the root cause. Will share more information with you later.

Thanks.

Hi AastaLLL,

Thanks for examining this issue! Are there any news about it?

I found out that right after a system reboot it works with no segmentation faults. The problem reappears after about 2 minutes.

Changing the clock made no difference. I’m running MAXN.

Many Thanks!

Hi,

We found that this issue is related to GPU rail gating.
Currently, please turn-off it as a temporal WAR:

$ echo 0 | sudo tee /sys/devices/gpu.0/railgate_enable

Our internally team is working on the rail gating fix.
Will let you know once we have further progress.

Thanks.

Hi,

Great, this seems like a stable workaround! Is it possible to configure this railgate_enable so that it remains after a restart?

Thx!

Hi,

You can create a custom systemmd boot script.
Or use cron @reboot rule to disable the setting automatically on every boot.

Thanks.

Hi,

Here is an update for you.

We have fixed this bug internally and the fix will be part of the next release.
Thanks for reporting this bug to us.

Hi AastaLLL,

I just tested R32.5.2 but without your workaround it still crashes!? In which release should it be fixed?

Thanks.

Please wait for the next official release. Thanks

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.