Locking on streams in CUDA?


I am trying to find a way to have the CPU block when GPUs have run out of streams.

I have arrays:
cudaStream_t streams[MAX_GPU_COUNT][STREAMS_MAX];

and also separate the data and results out into other arrays, again dimensioned by [MAX_GPU_COUNT][STREAMS_MAX].

And I want to use the streams to launch multiple kernels (locking the appropriate mutex in the mutexes array) on my Titan cards under Windows 7. Then I wanted to use cudaStreamAddCallback to process the results and to unlock the mutex to mark the stream as available again for another kernel launch.

Then I discovered that boost/thread.hpp does not compile using nvcc. I tried the advice to separate device and host code but was not able to do that because I am using Thrust as well.

I wonder if anyone knows of an elegant way to do the locking and unlocking?

Hope that makes sense,

Why can’t you have the callback in nvcc-compiled code call a function in cpp-compiled code that calls into Thrust?

Thanks Tera,

You got me thinking. I have managed to separate into host and device code so I am using boost::mutex now.

Only that my design is bad. If the callback routine or the method it calls crashes, then the mutex in the mutexes[MAX_GPU_COUNT][STREAMS_MAX] is never unlocked. I know that it is recommended to wrap mutex in a class and call the unlock in its destructor, but how to do that with CUDA callbacks?

Thanks for any help,

I guess that question boils down to “how to detect that my code has crashed”. I’m not sure why you’d expect your callback routine to crash. I’d much more expect something to go wrong on the CUDA side, where you could call the classes’ destructor from the error handling code that handles return codes other that cudaSuccess. You could also call cudaStreamQuery() in a few strategic places to see if the stream is still busy or whether there is any error pending.