Hi folks,
I was troubled with a strange behavior around cuStreamSynchronize().
It seems to me cuStreamSynchronize() blocks something forever when I set up event objects in addition to async memcpy and kernel execution.
The steps to reproduce are as follows:
-
cuCtxPushCurrent(context) … The context was created by other thread.
-
cuStreamCreate(&stream, 0);
-
cuEventCreate(&events[i], CU_EVENT_DEFAULT) … i = [0…3]
-
cuMemAlloc(&devmem, dma_length);
-
cuEventRecord(events[0], stream);
-
cuMemcpyHtoDAsync(devmem, dma_buffer, dma_length, stream);
-
cuEventRecord(events[1], stream);
-
cuLaunchKernel(…);
-
cuEventRecord(events[2], stream);
-
cuMemcpyDtoHAsync(dma_buffer, devmem, dma_length, stream);
-
cuEventRecord(events[3], stream);
-
cuStreamSynchronize(stream);
I don’t think the above code does something special. However, it blocks at the step 12 forever.
In case when I commented out all the events related stuff (3, 5, 7, 9, 11), it surprisingly works fine.
For more investigation, I tried to replace cuStreamSynchronize() by the following code block.
do {
ret = cuEventQuery(events[0]);
printf("events[0] = %d\n", ret);
ret = cuEventQuery(events[1]);
printf("events[1] = %d\n", ret);
ret = cuEventQuery(events[2]);
printf("events[2] = %d\n", ret);
ret = cuEventQuery(events[3]);
printf("events[3] = %d\n", ret);
} while (ret != CUDA_SUCCESS);
It continuously print out “600” (= CUDA_ERROR_NOT_READY). It is quite strange the first event is not ready, even though nothing prevent it.
In addition, I tried to remove all the cuEventRecord() but cuEventCreate() being still remained. Then, I got same result.
Which scenario will cause such an unexpected behavior?