Problems with the summation of arrays There are no values â€‹â€‹in the array

Wisdom · April 20, 2012, 4:37am

Hi all!

I am new in CUDA programming.

Wrote program to sum â€‹â€‹two arrays in the third array.

For some reason, the target array C is always zero, even after adding … do not tell what I’m doing wrong?

The source code Cuda.cu:

#include <iostream>

#include <cuda_runtime.h>

__global__ void sum(float *A, float *B, float *C) 

{

    int n = blockDim.x * blockIdx.x + threadIdx.x;

    C[n] = A[n] + B[n];

}

void StartSum(float *A, float *B, float *C, int N)

{

    sum<<< N/64, 64 >>>(A, B, C) ;

}

The source code to initialize arrays and calling summation:

#include <windows.h>

#include <cuda.h>

#include <cuda_runtime.h>

#include <cuda_runtime_api.h>

#include <iostream>

#define N 5

void StartSum(float *A, float *B, float *C, int n);

int main() 

{

    float a[N] = {1,2,3,4,5}, b[N]={-2,-4,5,7,1}, c[N] = {0,0,0,0,0};

    cudaError_t err;

    float *dev_a , *dev_b , *dev_c ;

    cudaSetDevice(0);

    cudaMalloc((void**)&dev_a , sizeof (float)*N);

    cudaMalloc((void**)&dev_b , sizeof (float)*N);

    cudaMalloc((void**)&dev_c , sizeof (float)*N);

    err = cudaMemcpy(dev_a, a, sizeof(float)*N, cudaMemcpyHostToDevice);

    err = cudaMemcpy(dev_b, b, sizeof(float)*N, cudaMemcpyHostToDevice);

    err = cudaMemcpy(dev_c, c, sizeof(float)*N, cudaMemcpyHostToDevice);

    StartSum(dev_a, dev_b, dev_c, N);

    err = cudaMemcpy(c, dev_c, sizeof(float), cudaMemcpyDeviceToHost);

    for (int i = 0; i<N; i++)

        std::cout<<c[i]<<" ";

    std::cout<<std::endl;

    system("PAUSE");

}

In deriving the results always get zero … std::cout<<c[i]<<" ";

Thank you in advance for your help.

Gilles_C · April 20, 2012, 5:06am

Hi,

your problem (at least your first problem) is that your kernel is never called since 5/64=0…

Should you check the pre-launch error status, you would have get an “invalid configuration argument” error due to this. To convince yourself, just try this:

$ cat gridDim.cu

#include <stdio.h>

#include <cuda.h>

__global__ void foo() {

	if (threadIdx.x==0)

		printf("in kernel, gridDim and blockDim are %d %d\n", gridDim.x, blockDim.x);

}

int main() {

	foo<<<0,1>>>();

	printf("%s\n", cudaGetErrorString(cudaGetLastError()));

	foo<<<1,1>>>();

	printf("%s\n", cudaGetErrorString(cudaGetLastError()));

	cudaDeviceSynchronize();

	foo<<<5/64,64>>>();

	printf("%s\n", cudaGetErrorString(cudaGetLastError()));

	return 0;

}

$ nvcc -arch=sm_21 -o gridDim gridDim.cu

$ ./gridDim 

invalid configuration argument

no error

in kernel, gridDim and blockDim are 1 1

invalid configuration argument

Then, I guess you’ll also have to add a test somewhere in your kernel to avoid accessing out of bound data (like an “if(n<N)” test).

Wisdom · April 20, 2012, 1:12pm

Thank you very much, dear Gilles_C!
It worked! External Image
I will continue to study this interesting subject.

Wisdom · April 20, 2012, 5:32pm

Sorry for the stupid question … sum two arrays happened.

How to find the sum of the elements of one array?

I tried to do this:

__global__ void Summation(float *A, float *C) 

{

	int n = blockDim.x * blockIdx.x + threadIdx.x;

	*C += A[n];

}

but this option does not work …

I solved the problem this way:

for (int i=0; i<count; i++)

    *C += A[n];

Do you think this approach is correct, in terms of technology CUDA?

The problem it performs - the array is summed. But I doubt whether I came to this issue.

Thank you.

genetic_priest · April 27, 2012, 7:34pm

Sorry for the stupid question … sum two arrays happened.

How to find the sum of the elements of one array?

I tried to do this:
__global__ void Summation(float *A, float *C) 

{

	int n = blockDim.x * blockIdx.x + threadIdx.x;

	*C += A[n];

}
but this option does not work …

I solved the problem this way:
for (int i=0; i<count; i++)

    *C += A[n];
Do you think this approach is correct, in terms of technology CUDA?

The problem it performs - the array is summed. But I doubt whether I came to this issue.

Thank you.

Last approach won’t help you much - sum of single array is performed by all cores not in parralel but serial way. Take a look at Parallel Reduction.

Topic		Replies	Views
How to sum all the elements of an array CUDA Programming and Performance	4	30451	April 6, 2011
CUDA - calculation of a sum CUDA Programming and Performance	7	5484	April 30, 2010
problem of a simple CUDA program cuda CUDA Programming and Performance	4	2824	July 11, 2010
Summing array elements using kernel Access frome the whole block grid CUDA Programming and Performance	3	852	July 16, 2010
Array Sum in cuda CUDA Programming and Performance	5	11477	May 30, 2010
Different results on device and Emulation mode CUDA Programming and Performance	5	3548	February 5, 2009
device global memory update questions CUDA Programming and Performance	7	5843	April 20, 2009
The kernel always returns values equal to zero CUDA Programming and Performance	10	7980	February 2, 2018
Unable to correctly use a 2D CUDA array with a texture object CUDA Programming and Performance cuda	2	308	February 25, 2024
A simple add vector (2D) doesn't show correct output (garbage output) CUDA Programming and Performance	12	1511	February 7, 2014

Problems with the summation of arrays There are no values â€‹â€‹in the array

Related topics