How to sum all the elements of an array

markitovtr1 · March 31, 2011, 7:17pm

Hi.

I just started working with CUDA and I had a problem doing some tests.

I’m trying to sum all the elements of a small size array (5 elements).

What am I not doing? The output on the program below is always 0. I tried some other ways, but the answer was either 1 (0 + the last element) or 0.

PS: Sorry for any grammar errors. I’m from Brazil.

#include <stdio.h>

#define SIZE 10

device volatile float sum = 0;

global void ArraySum(float *array)

{

int index = threadIdx.x;

sum = sum + array[index];

__syncthreads();

}

int main(void)

{

float array, *param, h_sum;

size_t size = sizeof(float) * SIZE;

cudaMalloc(&param, size);



for (int i = 0; i < SIZE; i++)

{

	array[i] = 1;

}

cudaMemcpy(param, array, size, cudaMemcpyHostToDevice);

ArraySum<<<1, SIZE>>>(param);

cudaMemcpyFromSymbol(&h_sum, &sum, sizeof(float));

cudaFree(param);

printf("%.2f\n", h_sum);

}

LSChien · April 1, 2011, 2:27am

You pass wrong parameter into cudaMemcpyFromSymbol. Please check page 34 of CUDA programming guide (RC4.0).

Also you need to check returned error code of each API, then you will find which one is not correct.

cudaError_t status = cudaMemcpyFromSymbol(&h_sum, sum, sizeof(float));

if (cudaSuccess != status){

    printf("Error: %s\n", cudaGetErrorString(status));

    exit(1);

}

Finally, you have race condition, 10 threads belong to the same warp, and a warp shares a common program counter,

so only one thread of the warp writes to variable sum.

ijvaughn · April 5, 2011, 6:07pm

__global__ void ArraySum(float *array)

{

int index = threadIdx.x;

sum = sum + array[index];

__syncthreads();

}

You are attempting a “Data Parallel Operation.” Mark Harris and others have already done a lot of research on this problem, look up Mark Harris and the CUDPP library.

apostglen46 · April 5, 2011, 8:59pm

You can do it also with minimum cuda code using the Thrust Library

raghu · April 6, 2011, 6:53am

You need to perform reduction operation while calculating sum of all array elements.

As LSChien pointed out rightly, your current algorithm has race condition introduced in it.

Topic		Replies	Views
How to copy integer from Device to Host? CUDA Programming and Performance cuda , kernel	1	763	September 3, 2023
Summing array elements using kernel Access frome the whole block grid CUDA Programming and Performance	3	851	July 16, 2010
Problems with the summation of arrays There are no values â€‹â€‹in the array CUDA Programming and Performance	4	2716	April 27, 2012
Array Sum in cuda CUDA Programming and Performance	5	11468	May 30, 2010
syncthreads() and += operator... CUDA Programming and Performance	6	6322	December 20, 2009
warp aggregated atomics result CUDA Programming and Performance	2	781	December 8, 2017
Reduction operation returns incorrect result CUDA Programming and Performance	1	408	November 18, 2018
Calculation sum of array parts have large prime number elements CUDA Programming and Performance	5	1845	December 23, 2009
Unable to access the correct matrix elements through threads CUDA Programming and Performance	5	682	May 27, 2017
Array and Shared memory Accessing element trough shared memory CUDA Programming and Performance	1	1919	August 13, 2009

How to sum all the elements of an array

Related topics