I have the following while loop setup with with kernel k,
results1_h - Used to hold results from the first kernel call
results2_h - Used to hold results from the second kernel call.
while (1) {
1. Launch k. Also send a device pointer results_d to the
kernel to store the results.
2. Do a MemcpyFromDtoH(results1_d, results1_h)
3. Parse results2_h. <- The first time we eneter the
loop there are no results in
results2_h and we should sail
through to the next step.
4. Call cuCtxSynchronize().
5. Launch kernel k again. Send the same device pointer
results_d to the kernel which we sent in (1)
6. Do a MemcpyFromDtoH(results1_d, results2_h) <- This
is where it returns a LAUNCH_FAILED error.
7. Parse results2_h.
8. Call cuCtxSynchronize().
}
At step (6) I get a LAUNCH_FAILED. I’m unsure as to why I get this error. I have called cuCtxSynchronize() at (4), so the gpu is free and so I should be able to call the kernel again like before, shouldn’t I?
Also note that I have created the cuda context with CU_CTX_SCHED_BLOCKING_SYNC. So the call to cuCtxSynchronize would block and so the 2 kernel calls and the corresponding memcpyDtoH from each kernel call happen sequentially.
What am I doing wrong that I’m getting a launch failure there?