cuStreamSynchronize() blocks forever?

Hi folks,

I was troubled with a strange behavior around cuStreamSynchronize().

It seems to me cuStreamSynchronize() blocks something forever when I set up event objects in addition to async memcpy and kernel execution.

The steps to reproduce are as follows:

  1. cuCtxPushCurrent(context) … The context was created by other thread.

  2. cuStreamCreate(&stream, 0);

  3. cuEventCreate(&events[i], CU_EVENT_DEFAULT) … i = [0…3]

  4. cuMemAlloc(&devmem, dma_length);

  5. cuEventRecord(events[0], stream);

  6. cuMemcpyHtoDAsync(devmem, dma_buffer, dma_length, stream);

  7. cuEventRecord(events[1], stream);

  8. cuLaunchKernel(…);

  9. cuEventRecord(events[2], stream);

  10. cuMemcpyDtoHAsync(dma_buffer, devmem, dma_length, stream);

  11. cuEventRecord(events[3], stream);

  12. cuStreamSynchronize(stream);

I don’t think the above code does something special. However, it blocks at the step 12 forever.

In case when I commented out all the events related stuff (3, 5, 7, 9, 11), it surprisingly works fine.

For more investigation, I tried to replace cuStreamSynchronize() by the following code block.

do {

      ret = cuEventQuery(events[0]);

      printf("events[0] = %d\n", ret);

      ret = cuEventQuery(events[1]);

      printf("events[1] = %d\n", ret);

      ret = cuEventQuery(events[2]);

      printf("events[2] = %d\n", ret);

      ret = cuEventQuery(events[3]);

      printf("events[3] = %d\n", ret);

  } while (ret != CUDA_SUCCESS);

It continuously print out “600” (= CUDA_ERROR_NOT_READY). It is quite strange the first event is not ready, even though nothing prevent it.

In addition, I tried to remove all the cuEventRecord() but cuEventCreate() being still remained. Then, I got same result.

Which scenario will cause such an unexpected behavior?

For more investigation, I could reproduce only when CU_EVENT_DISABLE_TIMING is NOT provided on cuEventCreate().

Does someone have information related to this behavior?