CUBLAS Vector Multiply

Does CUBLAS have any function that could be reasonably used to implement element by element multiplication between two vectors?

Good question man…I was trying to solve the same problem during last few days. I was wondering if not only CUBLAS, but any other implementation of BLAS has element-wise vector/vector multiplication implemented. And to be honest, I wasn’t able to find definitive answer yet. But one of my colleagues suggested me to inspect BLAS level 2 routines which implements various types of Ax (matrixvector) operations. That’s because element-wise vector multiplication means nothing more than A*x for diagonal matrix A. I believe this could help you…

Certainly this is a trivial custom kernel to write. It might be easier to figure out the memory layout for vectors in CUBLAS and use your own kernel.

I have one note and one question related to your comment ;-). Element-wise multiplication could be of-course implemented using very very trivial user-defined kernel. But in case of iterative techniques based on BLAS (as in my case), there is a well-founded demand to use BLAS operations for all successive steps (SAXPY, GEMMV, SDOT, …). This is the reason which leads me to confidence that element-wise vector*vector should be somehow implemented in BLAS.

My question follows:

I’m using cuMemAlloc() (driver api) instead of cublasAlloc() and cuMemcpyHtoD() instead of cublasSetVector() in my application, because I need to use both CUBLAS routines and user-defined kernels on my pieces of data (large vectors in fact). From what I observed so far, no problems occurred and all operations passes well. Do you think that I’m just lucky man and I should strictly use cublasAlloc and cublasSetVector when using CUBLAS? Thx

No, you can mix cublasAlloc and cublasS/GetVector with regular cuda Malloc and Memcpy calls (both driver and high-level API).

The cublas calls are there for convenience (for example if you are calling cublas from Fortran and don’t want to mix C and Fortran)

Thank you very much, your reply raised my confidence in my piece of code :)

This is what I am looking for. It’s good to know that you can mix CUDA with CUBLAS, but how would CUBLAS use the memory (shared, texture, etc)? Does CUBLAS know how many multi-processors you have, and optimize each function according to your system parameters? Without knowing how CUBLAS uses the memory, using CUDA at the same time could cause conflict.

Every kernel call independently uses all resources on the GPU. There can be no conflicts between two separate kernel calls. But if you are truly curious about CUBlAS’s block and grid parameters, just read the source :)

Thanks! Yes I like to. Where is CUBLAS source? I can’t find it anywhere.

The links are in this post:

Thanks a lot for the secret passage leading to the source code ;)

Yes I found where they decide the number of blocks and threads, and whether to use texture memory. It seems to be decided for each function call. Since I have many calls back to back, do they have an optimizer that blends the functions and find a global optimal assignments? Maybe I have to mixture these .h and .cu files by myself.

I was wondering if you found an efficient way to compute element wise vector multiplication and division.

Also if implementing a custom kernel wouldn’t penalize performance while mixing with cublas routines (i don’t know how to implement a custom kernel yet, about to start reading…)



Answering myself. I coded this kernel:

[codebox]global void m2(float *A, float *B, int maxN){

int i=blockIdx.x*BLOCK_SIZE+threadIdx.x;





where maxN would be the size of vector A (and also B’s)

There is no need for the syncthreads.

Is the source for CUBLAS available somewhere else? Why is it removed?

It is now available for registered developers.

OK, I’m an registered developer, how/where can I get it?

Sorry to bump such an old topic, but did anyone ever figure out if there was a cublas function to do element vector-vector multiply? They have a dot product function, so I’d assume this existed.

Hi all,

I would like to perform element-wise multiplication between two vectors using CUBLAS. Could you please share the code for the same. The link mentioned here does not contain the code.