I have tried a lot of possible scenarios with both cuEventRecord and cudaEventRecord on a stream . No matter i pass any specific stream to the second argument , It always gives the result as if i was passing 0(context). I’m pretty convinced it could be a bug in the API. This api has broad applicability
where developers like to use stream functionality. Please suggest how i can report this issue to the CUDA support group. Or is there any way i can get help from the developers working in this area?
It will likely get noticed here, but otherwise you can send a PM to an nvidia employee (tmurray e.g.). This might even be a known bug (somewhere I remember reading something like this before)
A simple example using using low level driver api. Modify the sample matrixMulDrv example by replacing the “runtest” with the following. Please observe the ****** marked comments.
void
runTest(int argc, char** argv)
{
// initialize CUDA
CUfunction matrixMul = NULL;
CU_SAFE_CALL(initCUDA(argc, argv, &matrixMul ));
int nstreams=2;
// allocate and initialize an array of stream handles
CUstream *streams = (CUstream*) malloc(nstreams * sizeof(CUstream));
for(int i = 0; i < nstreams; i++)
CU_SAFE_CALL( cuStreamCreate(&(streams[i]),0) );
[b] //cuEventRecord(stop_event, streams[0]); // 1)*************** First disable this and enable 2)
cuLaunchGridAsync( matrixMul, WC / BLOCK_SIZE, HC / BLOCK_SIZE , streams[1]);
cuEventRecord(stop_event, streams[0]); // 2)********* second time disable this and enable 1)
[b]//*****Observe the second value printed here we are suppose to see almost value no matter we choose 1) or 2) because there is ano task on stream[0].