Result of simple vector summation is not correct.

AlvertRee · July 20, 2013, 3:55pm

Hello, there.

Thank you for reading this topic!
I made a code which just add two vectors, :)
However, its result is not matched real result as below.

Where do I have to change for calculating results come correct.
Thank you in advance.

#include <stdio.h>
#include <stdlib.h>

__global__ void add( int *a, int *b, int *c)
{
    int tid = threadIdx.x;

    c[tid] = a[tid] + b[tid];
}



int main()
{
    int N = 1500;
    int i;
    int sum;
    int *a, *b, *c;
    int *dev_a, *dev_b, *dev_c;

    // memory allocation in CPU
    a = (int *)malloc(N*sizeof(int));
    b = (int *)malloc(N*sizeof(int));
    c = (int *)malloc(N*sizeof(int));

    // allocation each parameter space in GPU
    cudaMalloc( (void**)&dev_a, N*sizeof(int) );
    cudaMalloc( (void**)&dev_b, N*sizeof(int) );
    cudaMalloc( (void**)&dev_c, N*sizeof(int) );

    // each value is stored into a and b vector
    for (int i=0; i<N; ++i)
    {
        a[i] = i;
        b[i] = i*2;
        c[i] = 0;
    }

    // Copy data into device(e.g. GPU) memory from host memory
    cudaMemcpy( dev_a, a, N*sizeof(int), cudaMemcpyHostToDevice );
    cudaMemcpy( dev_b, b, N*sizeof(int), cudaMemcpyHostToDevice );

    // Call function name as add
    // <<< , this number concerns iteration number >>>
    add<<<1, N>>>(dev_a, dev_b, dev_c);

    cudaMemcpy( c, dev_c, N*sizeof(int), cudaMemcpyDeviceToHost);

    // The result
    for (int i=0;i<N;++i)
    {
        printf("%d + %d = %d\n",a[i],b[i],c[i]);
    }

    // free memory
    cudaFree(dev_a);
    cudaFree(dev_b);
    cudaFree(dev_c);

    return 1;
}

pasoleatis · July 20, 2013, 7:32pm

This line is not executed if N is bigger than 1024 (512 for older cards). If you would add some errror checking you would get an error.
Try this:

// make the host block until the device is finished with foo
  cudaThreadSynchronize();

  // check for error
  cudaError_t error = cudaGetLastError();
  if(error ! = cudaSuccess)
  {
    // print the CUDA error message and exit
    printf("CUDA error: %s\n", cudaGetErrorString(error));
    exit(-1);

AlvertRee · July 23, 2013, 9:01am

Dear pasoleatis,

Thank you for a comment.
I will check it!

Yeah, I just executed above code with changing N.
Thank you!

Sincerely,
Albert