My program crashes when I run it in the visual profiler, if I create more than about 24 cuda streams. It works fine otherwise. The program has several host-side threads which are necessary for achieving the desired frame rate, even though I am expecting a big boost from the GPU.
The following code fragment is stripped down to illustrate the point. It just makes and destroys streams. Obviously, I would like to have useful code between the create and destroy, but just this code makes the program crash. Strangely, it crashes in seemingly unrelated parts of the code. Also strangely, it crashes only when I am running the visual profiler. The application is 32 bit; everything is CUDA 5.0. The OS is WIndows 7 SP1 with current updates. The driver is 9.18.13.1106
for( int ii=0; ii < 10; ++ii )
{
cudaError_t errCode = cudaSuccess;
errCode = cudaSetDevice( 0 );
if( errCode != cudaSuccess )
throw std::runtime_error( "could not cudaSetDevice" );
cudaStream_t gpuStream=0;
Trace( "creating cuda stream\n" );
errCode = cudaStreamCreate( & gpuStream );
if( errCode != cudaSuccess )
throw std::runtime_error( "could not cudaStreamCreate" );
if( gpuStream )
{
errCode = cudaStreamSynchronize( gpuStream );
if( errCode != cudaSuccess )
throw std::runtime_error( "could not cudaStreamSynchronize" );
errCode = cudaStreamDestroy( gpuStream );
if( errCode != cudaSuccess )
throw std::runtime_error( "could not cudaStreamDestroy" );
gpuStream = 0;
}
}
It gets into trouble some time after about 24 calls to cudaStreamCreate. This is true whether all the calls come from the main thread or whether I create several streams in each of 4 worker threads. The error codes are all OK.
What is wrong? I don’t see anything in the document that says I shouldn’t be able to do this?