I have some code where I first allocate memory, then run a kernel that writes to that memory, and then run another kernel that reads from that memory and writes the result. I use cudaLaunch instead of the normal <<< >>> calls. Something like this
allocate memory, a
run kernel 1 that writes to a
run kernel 2 that reads from a and writes to b
free memory a
If I run kernel 1 or 2 by themselfes it runs fine, but when I try to run them after each other I get segmentation faults. Can this be due to the fact that memory a is free’d while kernel 2 is running (since the control is returned to the host when kernel 2 has been launched) ? If this is the case, how do I avoid that?