non-squared matrix multiplication in CUDA, is it possible?


I’m new in CUDA parallel computing. I have a question; is it possible to compute non-squared matrix multiplication in CUDA? For example; 10000*3000 sized matrix multiplication. Is it necessary for matrix dimension size to be 2^n?

Thanks ;)

Yes, with CUBLAS you can multiply generic matrices.

You can write a multiplication algorithm for any size. But with matricies that do not have a size that is an integral multiple of the block size (which themselves should have an integral multiple of 32 threads) you would have warp divergence in some blocks, this will be neglagible for sufficiently sized matricies though.

What I do is I write a perfectly parallel piece of code that assumes that the matrices will be of a certain size, then extend it to a more general case.

Another option you can do (with matrix dot products) is to just add a padding of zeros to the matricies to make them the right size. This will not change the result and will enable you to write elegant code assuming regularly sized matricies.

I understand that it provides fast matrix multiplication in CUDA too.

thanx, i will try it on my studies. i’m just new for CUDA.