Multiplication of Symmetric banded Matrix by a Vector


Can someone suggest a kernel for Symmetric banded Matrix by a Vector multiplication? I wrote my own kernel but isn’t so optimized.

An optimal algorithm should read once the half band of the matrix from the global memory and once the elements of the vector(…putting them into shared memory…).

I don’t want to use any library like cublas…

thank you.

PS. Is it available the code of the cublas function?

Why don’t you want to use cublas?

I need to use the result of the multiplication in another kernel…I dont want to waste resources passing twice element from a kernel to another.

Just because you use cublas, doesn’t mean you have to copy the results back to the host before calling the next kernel. Just make sure the next kernel knows where the result(s) of the cublas calculation is/are - i.e. pass the address(es) of the result(s).


I just read somewhere else on this forum that the above statement is not totally true. It depends which cublas routine you want to use!

Yes MMB, you are right, but using different kernel I can not take advantage of shared memory reading data from global memory once.

You are right but most of the times it is better to have 2 different kernels, each having an optimal launch grid. By trying to do everything in one kernel, you will have to compromise on the implementations of the two kernels which will outweigh the saving you got by avoiding an extra read from global memory.

Even in cublas, we sometimes split a BLAS routine in multiple kernels for that reason.

Moreover, in Fermi, you can also take advantage of concurrent kernels using streams