Hi all,
As i don’t want to waste computation power, i’m using different cores in addition to the GPU(s). The GPU is doing CUBLAS computations (mostly) while the CPUs are typically running ATLAS kernels. I’m using the driver API.
My problem is that any CPU may need a data that is in the memory of a GPU. Accessing that memory is only possible if the thread that performs the memory transfer has taken the proper cuda context, but i’m having troubles as it’s not clear at all in the case of CUBLAS.
The CUDA context is initialized by the thread which will perform the CUBLAS computation later on (one thread is dedicated to the GPU). I would therefore assume that the normal workflow would be :
cuInit, cuCtxCreate, cublasInit to initialize the CUBLAS lib, then release the context. When a memory transfer is needed, i would just push the context, do the tranfer and pop the context. When the cublas thread wants to do computation, it would also take the context, perform some cublas call, then release the context.
Unfortunately, this is not working :) The problem is that i don’t know what exactely happens during the cublasInit call, documentation suggests it is associated to the current cuda context. So, when the cuda context is restored on the thread that did a cublasInit, cublas calls should be legal. Of course, i can’t afford to initialize cublas before any call to cublas (unless it is really light, but i strongly doubt it is …).
So, can anybody tell me how cublas should be initialized in the case of multicore systems ? i want memory transfers to be possible from anywhere, and i want the gpu thread to be able to do cublas calls.
As it pretty important that CUBLAS is possibly used in a true multicore environment, i’d be really glad to see what’s the proper way to deal with that problem. Thanks a lot !
++
Cédric
PS : i enclose an synthetical example of code which should be working if the cublasInit call actually associated the current cuda context with a “cublas context”. It fails when the second thread tries to grab the context again…
CUDA_Contexts.tar (10 KB)