Asynchronous execution of kernels

Jason102 · July 9, 2008, 3:16pm

I am trying to run start a kernel on the GPU and then still have the ability to issue cudaMemcpy commands while it is going. I have tried several methods, including creating new threads (which CUDA chokes on), setting up streams, and using the method in the AsyncAPI demo. In each case, memcpy and event calls can be asynchronous, but when calling a kernel execution of the main program stops until the kernel finishes.

Here is a basic example of code I’ve run (note that GPUFlag is a pointer to a GPU memory location):

cudaStream_t loop, setflag;

int flag = 1, counter = 0;

cudaEvent_t stop;

cudaStreamCreate(&loop);

cudaStreamCreate(&setflag);

CUDA_SAFE_CALL(cudaEventCreate(&stop));

CUDA_SAFE_CALL(cudaThreadSynchronize());

runLoopGPU<<<BLOCK_COUNT, THREAD_COUNT, 0, loop>>>();

cudaEventRecord(stop, loop);

CUDA_SAFE_CALL(cudaMemcpyAsync(GPUFlag, &flag, sizeof(int), cudaMemcpyHostToDevice, setflag));

printf("Waiting\n");

fflush(stdout);

while(cudaEventQuery(stop) == cudaErrorNotReady)

    counter++;

printf("Job's done with a count of %d\n", counter);

exit(0);

All runLoopGPU does is sit in a while loop checking GPUFlag for a value other than 0 (which, yes, it is initialized to before any of this is run). The software will sit in a loop until the watchdog timer stops the kernel execution. It will then print ‘Waiting’ to the screen and sit in the loop forever.

Is there some way to accomplish what I want to do?

Reimar · July 10, 2008, 8:04am

Well, does it indeed or does it just once check the value of GPUFlag and then sit in an endless loop, e.g. because you forgot to use “volatile” or a compiler bug? Hard to know without seeing the code for it and the generated GPU asm.

There is no question that you are misusing the CUDA API though, so even if you get it to work you will have to expect a lot of problems.

Topic		Replies	Views
cudaMemcpy during kernel execution asynchronous kernel launch CUDA Programming and Performance	2	3164	July 20, 2007
Callbacks from GPU to CPU CUDA Programming and Performance	5	3847	December 10, 2008
question on asyncAPI.cu CUDA Programming and Performance	1	670	February 12, 2011
Kernel executed in non-default CUDA stream waits for other streams to complete cudaMemcpyAsync CUDA Programming and Performance cuda	14	513	November 18, 2024
is kernel in stream 0 asynchronous? CUDA Programming and Performance	10	3871	April 23, 2011
Question regarding concurrent kernel execution and data transfer CUDA Programming and Performance	3	578	February 26, 2020
asyncAPI sample question CUDA Programming and Performance	9	5232	December 18, 2007
Kernel Synchronization in CUDA not fully explained in programming guild CUDA Programming and Performance	1	10690	February 25, 2010
0.9 asynchronous kernel question CUDA Programming and Performance	7	8588	June 14, 2007
Overlap cudaMemcpyAsync and kernel CUDA Programming and Performance	1	555	February 10, 2021

Asynchronous execution of kernels

Related topics