Mixing CUDA and CUBLAS possible? Is avalaible the CUDA source code?

jpeinado · January 5, 2010, 12:38pm

Hi:

I am working in a MATLAB algorithm that it is solved on a GPU using CUBLAS. All works almost OK, but I need to implement some matrix computations that are not implemented in CUBLAS. Then my question is

How to do this? I suppose that I must mix CUDA/CUBLAS, but I dont know how to do this.

I think CUBLAS is implemented using CUDA.

Is there any way to get the CUDA source?

With many thanks in advance

jpeinado

maringanti · January 5, 2010, 1:58pm

You can mix up the launching of different kernels and cublas functions in the same program. Write a separate kernel that does the matrix computations you want and use cublas for the other functions. Yes, you can exchange data and pointers from your kernel to a cublas function. Just be careful about where the cublas functions returns the values to. For eg, the cublasSDot returns a float to the cpu. If you try to write that output to a location on the GPU you will get errors.

jpeinado · January 5, 2010, 4:26pm

Thank you very much for your help maringanti

Do you know how can I see code to do this?. For example, it could be good, to have the CUBLAS sources…My idea is to do the kernel in a similar way as CUBLAS does, to maintain the good performance of CUBLAS.

jpeinado

maringanti · January 6, 2010, 10:13am

Cublas Source code was available for version 1.1 - you can search the forums for it. I am not sure if the latest version’s source code is available. IMO you do not need to worry about coding it similar to CUBLAS. Just write the kernel in such a way that you are utilising the memory bandwidth efficiently and are launching enough number of threads so that all multiprocessors are busy. If you need further optimization you can look at volkov’s papers. I would say it is better to write the code in your own way instead of following the CUBLAS source structure.

jpeinado · January 6, 2010, 8:26pm

OK.

Thank you very much.

jpeinado

dmyablonski · January 6, 2010, 10:31pm

Expanding upon this, since your question seems to be answered already…

When using CUBLAS, the vector or matrix is added to GPU memory, and then remains there until you free the device memory and shutdown CUBLAS. If my matrix is put on the GPU using cublas, can I access that matrix with a CUDA kernel just by it’s data pointer that is used for CUBLAS calls? Or is there extra formatting stored with the matrix when using CUBLAS?

For instance, if I wanted to:

allocate a vector and matrix and store them to the GPU memory with CUBLAS
multiply them with a kernel that I write myself
read matrix from GPU memory back to host using CUBLAS

Is that possible and as easy as I would hope?

Thanks.

avidday · January 6, 2010, 10:46pm

You can do that. The CUBLAS memory management functions (alloc, free, set, get) are just wrapper functions for the standard CUDA runtime API equivalents, and “CUBLAS pointers” are just regular GPU global memory pointers. There isn’t anything special about them. Almost without exception, the runtime API and your own kernels can be used interchangeably with CUBLAS functions.

jpeinado · January 7, 2010, 10:00pm

Thank you very much. For this reason I would lile to see(if possible) any kernel implementing a CUBLAS call. Is this possible?

Thank you

jpeinado

avidday · January 7, 2010, 10:25pm

The CUBLAS source is restricted to registered NVIDIA developers (and the current source release is rather out of data anyway). The current CUBLAS sgemm() implementation is (as I understand it), and wrapper around this kernel code written by Vasily Volkov from Berkeley. That might be enough to get going with.

jpeinado · January 7, 2010, 10:39pm

Yes, This is what I was looking for…

Thanks

jpeinado

pasoleatis · May 8, 2010, 2:46pm

Hello,

I havea similar problem to the one of the original poster. My code is supposed to solve an equation interatively by calling a an update function until convergence is achieved. The fucntion update implies the following:

a vector vector divion by elements dummy[i]=cr[i]/m1[i]
a matrix vector multiplications ck[i]=sum_along_j(H[i][j]*dummy[j]
vector vector multplication by each element ck[i]=k[i]*m2[i]
new vector calculated dk[i]=k[i]*k[i]/(1-k[i])
a vector vector divion by elements kdummy[i]=dk[i]/21[i]
a matrix matrix multiplicationc dr[i]=sum_along_j(H[i][j]*kdummy[j]
vector vector multplication by each element dr[i]=dr[i]*m1[i]
newc[i]=some function of dr[i]

Does this imply that I have to do something like this? Define several kernels and call them in comabination with the cublas function?

kernel 1
cublas
kernel 2
kernel 3
kernel 4
cublas
kernel 2
kernel 5

Steps 1-8 have to be executed for about 10k -20k times to achieve convergence. Is it possible to have 5 kernels and call them togher with the cublas matri-vector multiplicatoin commands over and over without having to transfer any data to the host?

P

maringanti · May 8, 2010, 5:36pm

@pasoleatis

yes.

Topic		Replies	Views
Combining cuBlas and Kernel code CUDA Programming and Performance	14	6650	April 1, 2017
CUBLAS src CUDA Programming and Performance	0	729	August 24, 2011
Multiple Cublas functions on single GPU CUDA Programming and Performance	5	1790	August 8, 2010
CUBLAS library and kernel CUDA Programming and Performance	4	1900	November 18, 2009
cublas matrix format/normal vector format CUDA Programming and Performance	2	3268	May 12, 2009
BLAS CUDA Programming and Performance	11	10011	July 6, 2008
CUBLAS Source code for CUBLAS functions CUDA Programming and Performance	0	1792	October 10, 2011
Anyone ever used CUBLAS? Changes in implementation... CUDA Programming and Performance	3	4210	February 12, 2012
Newbie question about cublas CUDA Programming and Performance	10	3461	December 2, 2010
simple matrix (or matrix vector) multiplication using CUBLAS CUDA Programming and Performance	9	5738	November 25, 2009

Mixing CUDA and CUBLAS possible? Is avalaible the CUDA source code?

Related topics