CUDA mean.

LFB · April 7, 2009, 10:41am

Hello.

I’m new to CUDA and I’m still a bit unsure about what are the best things to do while programming on CUDA.

I implemented a simplistic mean but can’t help thinking the last part could be much more efficient.

// Mean kernel

__global__ void cudamean(float *X,float *R,int jump,int size) {

	int x = threadIdx.x,z = blockIdx.x;

	int i;

	int nt = blockDim.x;

	__shared__ float u;

	__shared__ float *D;

	extern __shared__ char Dmean[];

	D = (float *) Dmean;

	// Sums values jumping

	for (D[x] = 0.0f,i = z*nt+x;i < size;i += jump) {

		D[x] += X[i];

	}

	__syncthreads();

	// The first thread sums the values of the other threads within the block

	if (x == 0) {

		for (i = 0,u = 0.0f;i < nt;i++)

			u += D[i];

		R[z] = u/size;

	}

}

The principle is resumed by loading the vector to the graphical device memory, summing values jumping “jump” (jump=blockDim*nblocks) positions) and computing the sum of all thread results.

After the result is written on the vector R, the processor sums all the results from the thread blocks.

I ran this with 512 threads since it’s the maximum my device permits (gtx260).

In the last part of the code, only one thread per block does the heavy work.

Is anyone willing to give me some ideas please (will it be on the algorithm of in that last heavy part)?

I’m posing here because I want to understand a lot more about this, since I’m aware that I’m only in the beginnings.

Thank you.

(If possible, change this post to CUDA Programming and Development section, since I didn’t realize I was posting in the wrong section, thank you).

jack · April 7, 2009, 2:57pm

Look at the reduction example (and it’s associated documentation) in the SDK. That’s about the most efficient method for doing anything like this.

Topic		Replies	Views
My reduction code is not really fast.. CUDA Programming and Performance	0	8679	April 11, 2011
scatter and gather with CUDA? CUDA Programming and Performance	3	10223	March 9, 2009
Summing matrix elements CUDA Programming and Performance	3	6958	July 4, 2011
Mean or sum total of array values CUDA Programming and Performance	2	3318	August 27, 2009
Array Sum in cuda CUDA Programming and Performance	5	11514	May 30, 2010
Easyway to compute the sum of the array? CUDA Programming and Performance	4	8047	February 13, 2008
How aggregate series on Cuda? CUDA Programming and Performance	2	1445	April 2, 2010
Summing blocks CUDA Programming and Performance	1	1376	July 10, 2008
Best way to sum the elements of a block array CUDA Programming and Performance	2	1110	March 25, 2009
Summing threads CUDA Programming and Performance	3	3170	June 7, 2011

CUDA mean.

Related topics