The kernel always returns values equal to zero

I am beginner in CUDA theme. I am trying to add two 10-element vectors, but the result is always zero. I do not understand why this happens. I show my complete code, , this is very simple.

#include “stdafx.h”
#include <stdio.h>
#include <time.h>
#include <conio.h>
#include
#include <cuda.h>
#include <cuda_runtime.h>

using namespace std;

#define N 10

global void Suma_vec( int *a, int *b, int *c, int n )
{
int tid = threadIdx.x; // Identificador del thread
c[tid] = a[tid] + b[tid];
}

int main(void)
{
int A[N], B[N], C[N];
int *dA, *dB, *dC;
srand (time(NULL));

//Se crea el vector A
for(int i=0; i<N; i++)
   A[i] = rand() % 101; 

//Se crea la matriz B
for(int i=0; i<N; i++)
   B[i] = rand() % 101; 

//Se reserva memoria en la GPU
cudaMalloc( (void**)&dA, N * sizeof(int)); 
cudaMalloc( (void**)&dB, N * sizeof(int)); 
cudaMalloc( (void**)&dC, N * sizeof(int)); 

//Se copian los vectores A y B en la GPU
cudaMemcpy( dA, A, N * sizeof(int), cudaMemcpyHostToDevice);
cudaMemcpy( dB, B, N * sizeof(int), cudaMemcpyHostToDevice);

Suma_vec<<<1,N>>>(dA, dB, dC, N);

//Se copia el resultado obtenido (GPU) en el vector C de la CPU
cudaMemcpy( C, dC, N * sizeof(int), cudaMemcpyDeviceToHost);

for (int j=0; j<N; j++)
	{
	cout<<A[j]<<"\t"<<B[j]<<"\t"<<C[j]<<endl; 
	}

cudaFree( dA);
cudaFree( dB);
cudaFree( dC);

getch();
return 0;

}

Your code works on my machine (GTX580, CUDA6.5). I only removed headers that I do not have (stdafx.h, conio.h) and the call to getch().

I use a GeForce 8400 GS, CUDA 6.5 and Visual Studio 10. The results are always zero. The data is copied well in GPU memory, if instead of

cudaMemcpy (C, dC, N * sizeof (int), cudaMemcpyDeviceToHost);

use

cudaMemcpy (C, dA, N * sizeof (int), cudaMemcpyDeviceToHost);

Then the first and third columns are equal. The problem is that it not takes the sum. Can you help me ?

What you should do in any case is to check for errors after the kernel call with something like

cudaDeviceSynchronize();
cudaError_t error = cudaGetLastError();
if(error!=cudaSuccess)
{
   fprintf(stderr,"ERROR: %s\n", cudaGetErrorString(error) );
   exit(-1);
}

ERROR: invalid device function

What does this mean?

Your GeForce8400 GS is a compute capability 1.1 GPU:

https://developer.nvidia.com/cuda-gpus

If you are using CUDA 6.5, and provide no arch switches, the default compilation target is cc2.0, which won’t run on your GPU (invalid device function).

You will need to specifically target a cc1.1 GPU when you compile. When you do so, CUDA will provide some warning messages that cc1.1 is deprecated, but the compiler will still work.

There are many resources on the web which explain how to target a different compute capability in visual studio.

In a nutshell, you should be able to go into your project properties…CUDA C/C++ properties…Device, and change the target to compute_11,sm_11

Now if it works, thank you very much and congratulations on your excellent forum…

Hi, we tried the above steps and it is working fine with Visual Studio compilation. But we are trying to compile the same using the command window and the result is back to 0. Can you please suggest what should be done in order to get the right answer. Which compiler should be used. We are currently using the cl.exe application found in C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin
Awaiting your response. Thanks

Hi i have simular problem
My machine is geforce 610m and cuda9
I dont know what sm and compute is better
Please help me

Cc is 2.1 for my gpu

CUDA 9 won’t work with that GPU