The kernel always returns values equal to zero

barbaro2014 · November 13, 2014, 12:58pm

I am beginner in CUDA theme. I am trying to add two 10-element vectors, but the result is always zero. I do not understand why this happens. I show my complete code, , this is very simple.

#include “stdafx.h”
#include <stdio.h>
#include <time.h>
#include <conio.h>
#include
#include <cuda.h>
#include <cuda_runtime.h>

using namespace std;

#define N 10

global void Suma_vec( int *a, int *b, int *c, int n )
{
int tid = threadIdx.x; // Identificador del thread
c[tid] = a[tid] + b[tid];
}

int main(void)
{
int A[N], B[N], C[N];
int *dA, *dB, *dC;
srand (time(NULL));

//Se crea el vector A
for(int i=0; i<N; i++)
   A[i] = rand() % 101; 

//Se crea la matriz B
for(int i=0; i<N; i++)
   B[i] = rand() % 101; 

//Se reserva memoria en la GPU
cudaMalloc( (void**)&dA, N * sizeof(int)); 
cudaMalloc( (void**)&dB, N * sizeof(int)); 
cudaMalloc( (void**)&dC, N * sizeof(int)); 

//Se copian los vectores A y B en la GPU
cudaMemcpy( dA, A, N * sizeof(int), cudaMemcpyHostToDevice);
cudaMemcpy( dB, B, N * sizeof(int), cudaMemcpyHostToDevice);

Suma_vec<<<1,N>>>(dA, dB, dC, N);

//Se copia el resultado obtenido (GPU) en el vector C de la CPU
cudaMemcpy( C, dC, N * sizeof(int), cudaMemcpyDeviceToHost);

for (int j=0; j<N; j++)
	{
	cout<<A[j]<<"\t"<<B[j]<<"\t"<<C[j]<<endl; 
	}

cudaFree( dA);
cudaFree( dB);
cudaFree( dC);

getch();
return 0;

}

hadschi118 · November 13, 2014, 1:04pm

Your code works on my machine (GTX580, CUDA6.5). I only removed headers that I do not have (stdafx.h, conio.h) and the call to getch().

barbaro2014 · November 13, 2014, 1:14pm

I use a GeForce 8400 GS, CUDA 6.5 and Visual Studio 10. The results are always zero. The data is copied well in GPU memory, if instead of

cudaMemcpy (C, dC, N * sizeof (int), cudaMemcpyDeviceToHost);

use

cudaMemcpy (C, dA, N * sizeof (int), cudaMemcpyDeviceToHost);

Then the first and third columns are equal. The problem is that it not takes the sum. Can you help me ?

hadschi118 · November 13, 2014, 1:58pm

What you should do in any case is to check for errors after the kernel call with something like

cudaDeviceSynchronize();
cudaError_t error = cudaGetLastError();
if(error!=cudaSuccess)
{
   fprintf(stderr,"ERROR: %s\n", cudaGetErrorString(error) );
   exit(-1);
}

barbaro2014 · November 13, 2014, 2:13pm

ERROR: invalid device function

What does this mean?

Robert_Crovella · November 13, 2014, 3:14pm

Your GeForce8400 GS is a compute capability 1.1 GPU:

[url]https://developer.nvidia.com/cuda-gpus[/url]

If you are using CUDA 6.5, and provide no arch switches, the default compilation target is cc2.0, which won’t run on your GPU (invalid device function).

You will need to specifically target a cc1.1 GPU when you compile. When you do so, CUDA will provide some warning messages that cc1.1 is deprecated, but the compiler will still work.

There are many resources on the web which explain how to target a different compute capability in visual studio.

In a nutshell, you should be able to go into your project properties…CUDA C/C++ properties…Device, and change the target to compute_11,sm_11

barbaro2014 · November 13, 2014, 3:28pm

Now if it works, thank you very much and congratulations on your excellent forum…

SnehaShankar · January 31, 2015, 12:53pm

Hi, we tried the above steps and it is working fine with Visual Studio compilation. But we are trying to compile the same using the command window and the result is back to 0. Can you please suggest what should be done in order to get the right answer. Which compiler should be used. We are currently using the cl.exe application found in C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin
Awaiting your response. Thanks

mahtab_fooladi · January 25, 2018, 10:42am

Hi i have simular problem
My machine is geforce 610m and cuda9
I dont know what sm and compute is better
Please help me

mahtab_fooladi · January 25, 2018, 2:12pm

Cc is 2.1 for my gpu

Robert_Crovella · February 2, 2018, 6:26am

CUDA 9 won’t work with that GPU

Topic		Replies	Views
problem in the program running on CUDA CUDA Programming and Performance	7	2537	September 20, 2015
vecadd outputs all zeros Teaching & Curriculum Support	2	1434	September 4, 2013
Newbie question on the return values of a vector addition CUDA Programming and Performance	3	4283	November 10, 2010
Getting started with CUDA ... cannot add simple vectors CUDA Programming and Performance	9	21071	January 31, 2011
Newbie: Super simple first CUDA program what's wrong? CUDA Programming and Performance	4	3583	October 2, 2009
My first program it doesn't behave as expected CUDA Programming and Performance	2	2545	July 19, 2009
CUDA Programs Returning Zero after Update to v6.5 CUDA Programming and Performance	8	1395	November 20, 2014
Simple Integer ADD program error Result is always zero CUDA Programming and Performance	2	7957	February 3, 2011
Unespected output for a basic program CUDA Programming and Performance	6	1031	December 10, 2014
Result of simple vector summation is not correct. CUDA Programming and Performance	2	829	July 23, 2013

The kernel always returns values equal to zero

Related topics