CUBLAS and cuda contexts

gonnet · October 8, 2008, 12:21pm

Hi all,

As i don’t want to waste computation power, i’m using different cores in addition to the GPU(s). The GPU is doing CUBLAS computations (mostly) while the CPUs are typically running ATLAS kernels. I’m using the driver API.

My problem is that any CPU may need a data that is in the memory of a GPU. Accessing that memory is only possible if the thread that performs the memory transfer has taken the proper cuda context, but i’m having troubles as it’s not clear at all in the case of CUBLAS.

The CUDA context is initialized by the thread which will perform the CUBLAS computation later on (one thread is dedicated to the GPU). I would therefore assume that the normal workflow would be :
cuInit, cuCtxCreate, cublasInit to initialize the CUBLAS lib, then release the context. When a memory transfer is needed, i would just push the context, do the tranfer and pop the context. When the cublas thread wants to do computation, it would also take the context, perform some cublas call, then release the context.

Unfortunately, this is not working :) The problem is that i don’t know what exactely happens during the cublasInit call, documentation suggests it is associated to the current cuda context. So, when the cuda context is restored on the thread that did a cublasInit, cublas calls should be legal. Of course, i can’t afford to initialize cublas before any call to cublas (unless it is really light, but i strongly doubt it is …).

So, can anybody tell me how cublas should be initialized in the case of multicore systems ? i want memory transfers to be possible from anywhere, and i want the gpu thread to be able to do cublas calls.

As it pretty important that CUBLAS is possibly used in a true multicore environment, i’d be really glad to see what’s the proper way to deal with that problem. Thanks a lot !

++
CÃ©dric

PS : i enclose an synthetical example of code which should be working if the cublasInit call actually associated the current cuda context with a “cublas context”. It fails when the second thread tries to grab the context again…
CUDA_Contexts.tar (10 KB)

gonnet · October 8, 2008, 1:46pm

That’s weird : removing the cublasInit makes the code work (for the first iteration) : i get 2*42.0f in my buffer, as expected. But when the context is poped and pushed again, this won’t work anymore.

So let us see what the CUBLAS doc says :

cublasStatus cublasInit (void)
initializes the CUBLAS library and must be called before any other
CUBLAS API function is invoked. It allocates hardware resources
necessary for accessing the GPU. It attaches CUBLAS to whatever
GPU is currently bound to the host thread from which it was invoked.

This is not what seem to happen. So, the question really boils down to “what is cublasInit” doing ?
Poping the context after a cublasInit shows there is an extra context that appears … however, that context cannot be popped as calling the cuCtxPopCurrent function indefinitely will always succeed and show the very same context.

Sounds buggy to me, or at least the documentation needs some clarification even though it’s clear most people are not playing with the driver API.

Topic		Replies	Views
Thread-Safe? cublas.dll / cufft.dll CUDA Programming and Performance	6	4202	February 7, 2008
Reccomended way of managing contexts in the driver API CUDA Programming and Performance	2	916	December 25, 2021
Does CUBLAS 4 RC-2 support using multiple contexts from a single host-thread? CUDA Programming and Performance	11	10619	August 19, 2011
Using CUBLAS with GTX295 CUDA Programming and Performance	2	1186	September 23, 2011
CUBLAS 4 RC uses a v3.2 context Prevents data sharing between CUBLAS & CUDA API CUDA Programming and Performance	3	16407	March 28, 2011
Threaded Cuda video decoding CUDA Programming and Performance	6	6689	June 20, 2017
creating a global context using driver api by default context created using driver api seem to be th CUDA Programming and Performance	12	1759	June 15, 2011
CudaMalloc is too expensive and GPU Memories CUDA Programming and Performance	6	2737	January 22, 2016
cudaGLSetGLDevice bug? CUDA 4.0 CUDA Programming and Performance	4	2816	December 5, 2011
Newbie question about cublas CUDA Programming and Performance	10	3325	December 2, 2010

CUBLAS and cuda contexts

Related topics