Is it possible to have a multi-threaded host application call DLL encapsulated GPU routines from different threads?
Currently, depending how I schedule and synchronize my threads, I get various runtime errors:
threads are allowed to execute concurrently:
call to cuLaunchKernel returned error 400: Invalid handle
threads are allowed to access the GPU one at a time, synchronized using a semaphore
call to cuEventCreate returned error 201: Invalid context
threads are executed serially, this error occurs when the second thread is started after the first thread is destroyed
call to cuMemcpyHtoDAsync returned error 201: Invalid context
Any help would be great.