Is there any other way to stop the process when I want to use another kernel with another set of flags to the GPU? The only way I found was using cudaThreadExit().
You don’t need to use cudaThreadSynchronize() because kernels submitted to the same stream are by default ordered and synchronized.
I think the original poster wants to abort a kernel in progress, which I do not believe is possible currently.
Edit: To clarify the first statement, I mean that the second kernel will not start until the first one is done and all writes flushed to global memory. Of course the second kernel submission is asynchronous and will return to the CPU immediately just like the first one.
Once device flags are set (either implicitly or explicitly as in cudaSetDevice etc…), it cannot be reset again… You need to spawn a separate thread if you want to use different flags…
And btw, cudaThreadExit() tears down the CUDA context maintained by the driver… It automatically happens when the thread exits. You dont need to call it… but then, why should 1 have such an API? – Only god knows and a few NV guys know…
It has been known to help in cases where the profiler writes out empty csvs. Explicitly adding a cudaThreadExit() seems to get the profiler buffers flushed. But that is the only time I have seen it used,
Then why my program as shown needs cudaThreadExit() to continue? Maybe because I need to open some new graphic windows after the calculation and I am using the same graphic card for both display and computation.
Can I set more than one device flag at the same time?
Even if I remove the graphic window operation in between kernels calls, I still need to put cudaThreadExit() to start the next call.
The last problem I have is that the device memory does not get freed when I do the command cuddaFree() / free() for one of the kernel. This one uses
cudaMemcpyToSymbol() and multidimensions allocation.