cudaGraphicsMapResources called on background thread returns cudaErrorInvalidGraphicsContext

Dear fellow coders,

I’m playing with CUDA to implement a framework for my home project. I faced an issue recently what I’m unable to solve and I have a feeling I made a wrong design decision.

My app has two threads:

  • one is the UI using wxWidgets and OpenGL to visualize a simple quad with a texture applied on it. Next to the texture I have a pixel buffer object what I’m using as an intermediate storage for the pixel data and a gateway between the OpenGL and CUDA. This buffer is registered for CUDA with cudaGraphicsGLRegisterBuffer.

  • the second thread is used for CUDA code only. The whole CUDA code runs on this thread in an iterative fashion and every N iteration I want to update the UI thread’s PBO.

The main idea behind this two thread approach is that the calculation can be long and I do not want to block the UI thread while I’m doing the iterations. Instead I would like to have a responsible UI (I can pan and zoom the quad, etc…) which updates if the backgrounds thread ask asks for it.

I’m planning to call cudaGraphicsMapResources method on the secondary thread when it is time to update the PBO and send an update to the UI.
To do this I want to map the resource, put some data into it with a kernel and call cudaGraphicsUnmapResources method to unmap the buffer. When this is all done I can send a message to the UI thread to schedule an update for the texture (from the new PBO content) and render the quad.

But I get cudaErrorInvalidGraphicsContext (219) error when I’m trying to call cudaGraphicsMapResources on the background thread. If move this mapping method to the UI thread, no issues with the call and I can update the PBO from CUDA…

Another possibly important info is that the VS solution contains several projects. The main thread implemented in an application (.exe) project and the CUDA code is a Static library linked to the exe.

Relevant code snipets (no error check here but no errors in the cuda calls, only if do the mapping on the second thread):
UI Thread:
Init GL and CUDA:

wxGLContextAttrs contextAttrs;
contextAttrs.CoreProfile().OGLVersion( 4, 5 ).Robust().ResetIsolation().EndList();
mContext = std::make_unique<wxGLContext>( this, nullptr, &contextAttrs );
SetCurrent( *mContext );
GLenum err = glewInit();
glGenBuffers( 1, &mPboId ); // create PBO
glBindBuffer( GL_PIXEL_UNPACK_BUFFER, mPboId  );
glBindBuffer( GL_PIXEL_UNPACK_BUFFER, 0 );

cudaGLSetGLDevice( gpuId );
cudaGraphicsResource_t cudaResource = nullptr;
cudaGraphicsGLRegisterBuffer( cudaResource, mPboId, cudaGraphicsMapFlagsWriteDiscard );

After everyting is set up. I can start calculating:

Still on the UI thread:

// get the cuda resource associated to the PBO:
cudaGraphicsResource_t res = cudaResource; // cudaResource is stored elsewhere..
// pass it to the raytracer:
mRayTracer->Trace( res
                   , iterationCount
                   , sampleCount
                   , updateOnIteration );
// the trace method looks like this:
void RayTracerImpl::Trace( cudaGraphicsResource_t pboCudaResource
                           , const uint32_t iterationCount
                           , const uint32_t samplesPerIteration
                           , const uint32_t updatesOnIteration )
  // cancel previous job, if any
  if ( mThread.joinable() )
    mCancelled = true;
    mCancelled = false;

  // Run rendering function async
  mThread = std::thread( std::bind( &RayTracerImpl::TraceFunct
                                    , this
                                    , pboCudaResource
                                    , iterationCount
                                    , samplesPerIteration
                                    , updatesOnIteration ) );

The following code is runnig on the background thread and cudaGraphicsMapResources returns cudaErrorInvalidGraphicsContext (219)

__host__ void RayTracerImpl::TraceFunct( cudaGraphicsResource_t pboCudaResource
                                         , const uint32_t iterationCount
                                         , const uint32_t samplesPerIteration
                                         , const uint32_t updatesOnIteration )
    render::ClearRenderBuffer( mPixelBufferSize, channelCount, mRenderBuffer );
    cudaError_t err = cudaSuccess;
    for ( uint32_t i( 0 ); !mCancelled && i < iterationCount; ++i )
      err = RunTraceKernel( mRenderBuffer, mPixelBufferSize, channelCount, *mCamera, samplesPerIteration, mRandomStates );
      if ( updateNeeded )
        err = cudaGraphicsMapResources( 1, &pboCudaResource );
        ! err is cudaErrorInvalidGraphicsContext (219) !

My questions are:

  • Should this design work at all? Can I call such a map operation on a background thread (where no opengl context, the PBO and the resource was created on a different thread the pboCudaResource is points to a memory allocated by on the UI thread, etc…)
  • Can I just pass the cudaGraphicsResource_t value as I’m doing it? It’s basically just a pointer to a structure allocated inside the CUDA runtime, right?
  • I learned that after CUDA 4.0 the context is shared between threads so if I understand correctly, accessing the resource from another thread should not be an issue…
  • If my whole idea is bad, what approach can I use to pass as possibly big chunk of data from GPU memory to OpenGL’s PBO as fast as possible?

Thanks for any info, hints, tips!

OS: Windows 10
GPU: GeForce RTX 2080 Ti
Cuda version 11.7
Cuda: Runtime API
Cuda compute capability: 7.5
Visual Studio 2019
Language: C++17

association of a CUDA compute context with an appropriate GL context is necessary for interop. I haven’t studied your code, but have you studied the relevant CUDA sample codes to see how they associate a GL context with a CUDA context?