CUBLAS and cuda contexts

Hi all,

As i don’t want to waste computation power, i’m using different cores in addition to the GPU(s). The GPU is doing CUBLAS computations (mostly) while the CPUs are typically running ATLAS kernels. I’m using the driver API.

My problem is that any CPU may need a data that is in the memory of a GPU. Accessing that memory is only possible if the thread that performs the memory transfer has taken the proper cuda context, but i’m having troubles as it’s not clear at all in the case of CUBLAS.

The CUDA context is initialized by the thread which will perform the CUBLAS computation later on (one thread is dedicated to the GPU). I would therefore assume that the normal workflow would be :
cuInit, cuCtxCreate, cublasInit to initialize the CUBLAS lib, then release the context. When a memory transfer is needed, i would just push the context, do the tranfer and pop the context. When the cublas thread wants to do computation, it would also take the context, perform some cublas call, then release the context.

Unfortunately, this is not working :) The problem is that i don’t know what exactely happens during the cublasInit call, documentation suggests it is associated to the current cuda context. So, when the cuda context is restored on the thread that did a cublasInit, cublas calls should be legal. Of course, i can’t afford to initialize cublas before any call to cublas (unless it is really light, but i strongly doubt it is …).

So, can anybody tell me how cublas should be initialized in the case of multicore systems ? i want memory transfers to be possible from anywhere, and i want the gpu thread to be able to do cublas calls.

As it pretty important that CUBLAS is possibly used in a true multicore environment, i’d be really glad to see what’s the proper way to deal with that problem. Thanks a lot !

++
Cédric

PS : i enclose an synthetical example of code which should be working if the cublasInit call actually associated the current cuda context with a “cublas context”. It fails when the second thread tries to grab the context again…
CUDA_Contexts.tar (10 KB)

That’s weird : removing the cublasInit makes the code work (for the first iteration) : i get 2*42.0f in my buffer, as expected. But when the context is poped and pushed again, this won’t work anymore.

So let us see what the CUBLAS doc says :

cublasStatus cublasInit (void)
initializes the CUBLAS library and must be called before any other
CUBLAS API function is invoked. It allocates hardware resources
necessary for accessing the GPU. It attaches CUBLAS to whatever
GPU is currently bound to the host thread from which it was invoked.

This is not what seem to happen. So, the question really boils down to “what is cublasInit” doing ?
Poping the context after a cublasInit shows there is an extra context that appears … however, that context cannot be popped as calling the cuCtxPopCurrent function indefinitely will always succeed and show the very same context.

Sounds buggy to me, or at least the documentation needs some clarification even though it’s clear most people are not playing with the driver API.

Thanks for your attention,
Cédric