MultiThreaded Host App using One GPU

Is it possible to have a multi-threaded host application call DLL encapsulated GPU routines from different threads?

Currently, depending how I schedule and synchronize my threads, I get various runtime errors:
threads are allowed to execute concurrently:

call to cuLaunchKernel returned error 400: Invalid handle

threads are allowed to access the GPU one at a time, synchronized using a semaphore

call to cuEventCreate returned error 201: Invalid context

threads are executed serially, this error occurs when the second thread is started after the first thread is destroyed

call to cuMemcpyHtoDAsync returned error 201: Invalid context

Any help would be great.

Hi Erik,

It’s something that wouldn’t be supported now, but I’ve added this request to TPR#20827, adding OpenACC compute regions to DLLs.