Hi,
I am very new to CUDA. I am trying to write a dot product kernel call but i am not sure how to start.
I have a float2 arrayA of size [ROWS * COLS], a float2 arrayB of size [COLS], and an output float2 arrayC of size [ROWS].
For every row of arrayA, I need to get the dot product with arrayB and store the results into arrayC. So, in total i will do dot product for ROWS times and will have ROWS answers.
Can anyone guide me on how to achieve this?
Thanks alot!
AndyL
August 22, 2008, 11:45am
2
Is the matrix A banded or fully populated?
VrahoK
August 25, 2008, 10:49am
3
Hi,
I am very new to CUDA. I am trying to write a dot product kernel call but i am not sure how to start.
I have a float2 arrayA of size [ROWS * COLS], a float2 arrayB of size [COLS], and an output float2 arrayC of size [ROWS].
For every row of arrayA, I need to get the dot product with arrayB and store the results into arrayC. So, in total i will do dot product for ROWS times and will have ROWS answers.
Can anyone guide me on how to achieve this?
Thanks alot!
[snapback]429581[/snapback]
1st simple option: Try and look if you can find some CUBLAS function to meet your requests (simple cublas example or the cublas function reference)
2nd (most likely) faster option: See if you can rewrite the reduction example to do what you want
Fuchs
August 25, 2008, 7:03pm
4
I have a float2 arrayA of size [ROWS * COLS], a float2 arrayB of size [COLS], and an output float2 arrayC of size [ROWS].
For every row of arrayA, I need to get the dot product with arrayB and store the results into arrayC. So, in total i will do dot product for ROWS times and will have ROWS answers.
[snapback]429581[/snapback]
You are aware that this is just a Matrix-Vector multiplication? The fastest implementation I know of for a MV Kernel is described in this paper:
Noriyuki Fujimoto, Faster Matrix-Vector Multiplication on GeForce 8800GTX,
In the Proceedings of the 22nd IEEE International Parallel and
Distributed Processing Symposium (IPDPS), LSPP-402, pp.1-8, April 2008
jack
September 1, 2008, 3:58pm
5
I think this is just a call to one of the Level-2 BLAS routines (Matrix-Vector Operations)…check the CUBLAS documentation for the one you need.