Result of a CUBLAS function

Nostra · April 21, 2010, 2:37pm

Hello !

I’m using some CUBLAS functions that return a value (e.g cublasDdot). The problem is that I need this value to stay on device memory so I can use it in a kernel that comes just after cublasdDot. Currently this value goes to the host memory and I need a cudaMemcopy to get it back on the device mem.

Is it possible to store the result of cublasDdot directly on the device memory ??

YDD · April 21, 2010, 3:29pm

Just make the result an argument to the next kernel call? It’s one number, so I don’t see it adding to the kernel launch latency.

Nostra · April 22, 2010, 7:51am

I had this idea too. Problem is that I use this result in many kernels including in an other CUBLAS function. I think i’ll go for a memcopy. Cudaprof gives me 3 Âµs for the transfert of the number, but as it’s inside a loop it’s 3Âµs*16384…:(

Too bad there’s not a way to return this value in the device memory.

avidday · April 22, 2010, 8:07am

If you use it with many kernel launches after calculation, copy it to a constant memory variable. It will be faster that global memory and won’t use any shared memory, unlike if you pass it as a kernel argument.

The other alternative, of course, is to write your own inner product kernel instead of using BLAS dot - it is a very simple mathematical operation, and there really aren’t many flops in it, so even a naive version probably won’t be all that much different in performance to the CUBLAS version.

Nostra · April 22, 2010, 8:32am

Thanks ! Gonna try it now :)

philippev · April 22, 2010, 4:50pm

Out of curiosity, what kind of linear algebra are you doing?

I am asking this because we were thinking to propose a second API routine for xDOT with the result written in Device Memory but we would like to know if this is worth the effort

This new API would also make more sense with the upcoming cublas stream support

avidday · April 22, 2010, 4:53pm

Now we are talking! Any timeline for streams support? I have several applications that will greatly benefit from exposing streams in CUBLAS…

philippev · April 22, 2010, 5:22pm

Coming very soon. Will be there in 3.1Beta…

Topic		Replies	Views
cublasSdot of CUBLAS 3.1 with streams CUDA Programming and Performance	1	5738	July 17, 2010
CUBLAS_V2 - Keep results in GPU or return it to CPU? cublasIdamin function in CUBLAS_V2 CUDA Programming and Performance	2	3196	January 31, 2012
The return of cublasSdot The return must be in CPU memory? CUDA Programming and Performance	5	6484	March 20, 2009
How to extract results from device? Cublas and cuda CUDA Programming and Performance	5	3450	July 20, 2009
Is there a way to make a CUBLAS function write its return value to a device variable CUDA Programming and Performance	0	914	May 1, 2009
Cublas, keep results on device CUDA Programming and Performance	1	8698	June 25, 2010
Cublas, sum and dot. Newbie question. CUDA Programming and Performance	6	5502	November 29, 2012
Doubts about CUBLAS CUDA Programming and Performance	3	3460	February 18, 2009
Returning value of cublasSdot to Matlab CUDA Programming and Performance	3	4186	October 17, 2008
Issue when calling cublasDdot from within kernel GPU-Accelerated Libraries	7	927	March 21, 2018

Result of a CUBLAS function

Related topics