"cudaMemcpyAsync" does not work in my program

Hell all,
I try to use flag to check whether the kernel execution is finished or not.
Following is my code.

bool* isFinished ;
device bool* flagTrue ;

cutilSafeCall( cudaMalloc((void**)&flagTrue, sizeof(bool)) ) ;
cutilSafeCall( cudaMemset(flagTrue, 1, sizeof(bool) ) ) ;
cutilSafeCall( cudaHostAlloc((void**)&isFinished, sizeof(bool), cudaHostAllocDefault ) ) ;

isFinished[0] = false ;

cudaMemcpyAsync( isFinished, flagTrue, sizeof(bool), cudaMemcpyDeviceToHost, 0 ) ) ;

while ( !isFinished[0] ) {
// Waiting loop

printf("!") ;

But, the isFinished[0] is doesn’t changed to true.
When I use cudaMemcpy, the code works well.

What is the problem of my code? ( I use window 7 and GTX 285 )

  • I found some way to the code run.
    That is use cudaEvent.
    Before start the code, add

    float elapsedTime ;
    cudaEvent_t start, stop ;
    cudaEventCreate(&start) ; cudaEventCreate(&stop) ;

    cudaEventRecord(start, 0) ;

After the code, add

cudaEventRecord(stop,0) ;
cudaEventSynchronize(stop) ;

cudaEventElapsedTime(&elapsedTime, start, stop) ;
printf("Kernel time: %f ms \n", elapsedTime );

cudaEventDestroy(start) ; cudaEventDestroy(stop) ;

Then, the code works.

What is the problem of original code?

This may be the C compiler (not nvcc) optimizing the apparently-constant while loop. Try marking the isFinished pointer as volatile.

Thanks for your reply.

But I don’t know how can I make the isFinised pointer to volatile.

Could you give me more advice? :)


To make a variable volatile, just put the volatile keyword before its type in its declaration. So here just do a volatile bool *isFinished.

I also have to say that it’s really a bad way to make sure that a transfer is terminated, there are efficient (and clean) means to do so: events (or streams). You should definitely not poll in memory like that.