A quick and easy way to do this is to use cublasSgemm to do a matrix multiply with a ones vector(vector whose elements are all 1.0f) of the same length as your data. You’ll probably have to write a trivial kernel to initialize your ones vector, but the call to Sgemm is fairly straightforward, just be careful to get the input dimensions correct. My guess is using cublas would be slower then the reduction example, but it would be interesting to see how much.
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Need help in debugging parallel sum reduction program | 4 | 1293 | March 7, 2010 | |
Working with large numbers Help to calculate an harmonic sum | 4 | 1737 | June 23, 2009 | |
reductions and powers of 2 | 5 | 3253 | November 18, 2008 | |
Reduction Reduction Reduction................. Precision Confusion Race Condition...... HELP! | 16 | 10494 | December 8, 2009 | |
float reduction, cpu and cuda answers differ | 4 | 3331 | April 1, 2008 | |
Best way to get the min value from an array | 3 | 3726 | March 4, 2008 | |
floating point precision on CUDA | 11 | 14886 | June 8, 2010 | |
Basic reduction with CUDA | 1 | 510 | March 22, 2018 | |
Best way to face this problem | 4 | 1166 | May 16, 2010 | |
CUDA - calculation of a sum | 7 | 5546 | April 30, 2010 |