Hi guys, can anyone suggest the most efficient way to sum n floating point numbers on cuda? thanks -asher

Taking sum of n floating point numbers

Accelerated Computing CUDA CUDA Programming and Performance

mattb3 May 6, 2008, 1:31am 12

A quick and easy way to do this is to use cublasSgemm to do a matrix multiply with a ones vector(vector whose elements are all 1.0f) of the same length as your data. You’ll probably have to write a trivial kernel to initialize your ones vector, but the call to Sgemm is fairly straightforward, just be careful to get the input dimensions correct. My guess is using cublas would be slower then the reduction example, but it would be interesting to see how much.

Topic		Replies	Views
Need help in debugging parallel sum reduction program CUDA Programming and Performance	4	1293	March 7, 2010
Working with large numbers Help to calculate an harmonic sum CUDA Programming and Performance	4	1737	June 23, 2009
reductions and powers of 2 CUDA Programming and Performance	5	3253	November 18, 2008
Reduction Reduction Reduction................. Precision Confusion Race Condition...... HELP! CUDA Programming and Performance	16	10494	December 8, 2009
float reduction, cpu and cuda answers differ CUDA Programming and Performance	4	3331	April 1, 2008
Best way to get the min value from an array CUDA Programming and Performance	3	3726	March 4, 2008
floating point precision on CUDA CUDA Programming and Performance	11	14886	June 8, 2010
Basic reduction with CUDA CUDA Programming and Performance	1	510	March 22, 2018
Best way to face this problem CUDA Programming and Performance	4	1166	May 16, 2010
CUDA - calculation of a sum CUDA Programming and Performance	7	5546	April 30, 2010

Taking sum of n floating point numbers

Related topics