A quick and easy way to do this is to use cublasSgemm to do a matrix multiply with a ones vector(vector whose elements are all 1.0f) of the same length as your data. You’ll probably have to write a trivial kernel to initialize your ones vector, but the call to Sgemm is fairly straightforward, just be careful to get the input dimensions correct. My guess is using cublas would be slower then the reduction example, but it would be interesting to see how much.
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| float reduction, cpu and cuda answers differ | 4 | 3379 | April 1, 2008 | |
| Best way to face this problem | 4 | 1206 | May 16, 2010 | |
| Reduction Reduction Reduction................. Precision Confusion Race Condition...... HELP! | 16 | 10584 | December 8, 2009 | |
| Easyway to compute the sum of the array? | 4 | 8076 | February 13, 2008 | |
| Summing matrix elements | 3 | 6986 | July 4, 2011 | |
| Reduction questions(newbie-ish) | 7 | 1869 | January 14, 2009 | |
| Simple Inefficient Parallel Addition | 5 | 3220 | April 10, 2009 | |
| Array Sum in cuda | 5 | 11556 | May 30, 2010 | |
| 2D reduction using CUDA The use a cuda and cublas library for a 2D simple reduction | 11 | 4562 | February 7, 2012 | |
| Problem using NPP sum Having trouble using reduction sum with NPP | 0 | 1044 | August 3, 2011 |