Dot Product

Hi,

I am very new to CUDA. I am trying to write a dot product kernel call but i am not sure how to start.

I have a float2 arrayA of size [ROWS * COLS], a float2 arrayB of size [COLS], and an output float2 arrayC of size [ROWS].

For every row of arrayA, I need to get the dot product with arrayB and store the results into arrayC. So, in total i will do dot product for ROWS times and will have ROWS answers.

Can anyone guide me on how to achieve this?

Thanks alot!

Is the matrix A banded or fully populated?

1st simple option: Try and look if you can find some CUBLAS function to meet your requests (simple cublas example or the cublas function reference)

2nd (most likely) faster option: See if you can rewrite the reduction example to do what you want

You are aware that this is just a Matrix-Vector multiplication? The fastest implementation I know of for a MV Kernel is described in this paper:

Noriyuki Fujimoto, Faster Matrix-Vector Multiplication on GeForce 8800GTX,

In the Proceedings of the 22nd IEEE International Parallel and

Distributed Processing Symposium (IPDPS), LSPP-402, pp.1-8, April 2008

I think this is just a call to one of the Level-2 BLAS routines (Matrix-Vector Operations)…check the CUBLAS documentation for the one you need.