problem in the program running on CUDA

fredgentil · September 10, 2015, 2:27pm

I am beginner in CUDA programming and I have a problem. I use the windows and made the installation of VS2013 and CUDA 6.5.
Created two sample programs to run them but the results are always zero (0). I did the program execution on the CPU and it worked but when adjusting for using the GPU the results are always zero (0).
What’s the problem?

fredgentil · September 10, 2015, 2:33pm

Code:

#include <stdio.h>

#define SIZE 1024

global void VectorAdd(int *a, int *b, int *c, int n)
{
int i = threadIdx.x;

if (i < n)
	c[i] = a[i] + b[i];

}

int main()
{
int *a,*b,*c;
int *d_a, *d_b, *d_c;

a = (int *)malloc(SIZE*sizeof(int));
b = (int *)malloc(SIZE*sizeof(int));
c = (int *)malloc(SIZE*sizeof(int));

cudaMalloc( &d_a, SIZE*sizeof(int));
cudaMalloc( &d_b, SIZE*sizeof(int));
cudaMalloc( &d_c, SIZE*sizeof(int));


for (int i = 0; i < SIZE; ++i)
{
	a[i] = i;
	b[i] = i;
	c[i] = 0;
}

cudaMemcpy(d_a, a, SIZE*sizeof(int), cudaMemcpyHostToDevice);
cudaMemcpy(d_b, b, SIZE*sizeof(int), cudaMemcpyHostToDevice);
cudaMemcpy(d_c, c, SIZE*sizeof(int), cudaMemcpyHostToDevice);

VectorAdd<<<1,SIZE>>>(d_a, d_b, d_c, SIZE);

cudaMemcpy( c, d_c, SIZE*sizeof(int), cudaMemcpyDeviceToHost);

for (int i = 0; i < 10; ++i)
{
	printf("c[%d] - %d\n", i, c[i]);
}

free(a);
free(b);
free(c);

cudaFree(d_a);
cudaFree(d_b);
cudaFree(d_c);

return 0;

}

RESULT:
c[0] - 0
c[1] - 0
c[2] - 0
c[3] - 0
c[4] - 0
c[5] - 0
c[6] - 0
c[7] - 0
c[8] - 0
c[9] - 0

THE EXPECTED RESULT:
RESULT:
c[0] - 0
c[1] - 2
c[2] - 4
c[3] - 6
c[4] - 8
c[5] - 10
c[6] - 12
c[7] - 14
c[8] - 16
c[9] - 18

Robert_Crovella · September 10, 2015, 2:35pm

Your code works correctly for me. You may have a machine setup issue, or a problem with how you are compiling.

Add proper cuda error checking to your code, then recompile and re-run it. The error output will be useful for discovering what the problem is.

If you don’t know what “proper cuda error checking” is, then google “proper cuda error checking” and take the first hit. Then apply that to your code.

fredgentil · September 11, 2015, 11:53am

I added the cudasafe method does not present any error in execution. I put on property of device but still not working. What else could you try?

fredgentil · September 11, 2015, 11:53am

#include <stdio.h>
#include
#include <cuda.h>

#define SIZE 1024

void cudasafe(cudaError_t error, char* message)
{
if (error != cudaSuccess) { fprintf(stderr, “ERROR: %s : %i\n”, message, error); exit(-1); }
}

global void VectorAdd(int *a, int *b, int *c, int n)
{
int i = threadIdx.x;

if (i < n)
	c[i] = a[i] + b[i];

}

int main()
{
int *a, *b, *c;
int *d_a, *d_b, *d_c;

a = (int *)malloc(SIZE*sizeof(int));
b = (int *)malloc(SIZE*sizeof(int));
c = (int *)malloc(SIZE*sizeof(int));

cudasafe(cudaMalloc(&d_a, SIZE*sizeof(int)), "cudaMalloc");
cudasafe(cudaMalloc(&d_b, SIZE*sizeof(int)), "cudaMalloc");
cudasafe(cudaMalloc(&d_c, SIZE*sizeof(int)), "cudaMalloc");


for (int i = 0; i < SIZE; ++i)
{
	a[i] = i;
	b[i] = i;
	c[i] = 0;
}

cudasafe(cudaMemcpy(d_a, a, SIZE*sizeof(int), cudaMemcpyHostToDevice), "cudaMemcpy");
cudasafe(cudaMemcpy(d_b, b, SIZE*sizeof(int), cudaMemcpyHostToDevice), "cudaMemcpy");
cudasafe(cudaMemcpy(d_c, c, SIZE*sizeof(int), cudaMemcpyHostToDevice), "cudaMemcpy");

VectorAdd << <1, SIZE >> >(d_a, d_b, d_c, SIZE);

cudasafe(cudaMemcpy(c, d_c, SIZE*sizeof(int), cudaMemcpyDeviceToHost), "cudaMemcpy");

for (int i = 0; i < 10; ++i)
{
	printf("c[%d] - %d\n", i, c[i]);
}

free(a);
free(b);
free(c);

cudasafe(cudaFree(d_a), "cudaFree");
cudasafe(cudaFree(d_b), "cudaFree");
cudasafe(cudaFree(d_c), "cudaFree");

return 0;

}

fredgentil · September 11, 2015, 12:15pm

I used GPUassert and presented the following error: GPUassert: invalid device function

Robert_Crovella · September 11, 2015, 1:03pm

Yes, you have a kernel launch error and did not implement the kernel error checking correctly (in what you have now posted).

The invalid device function error is a kernel launch error that can only be discovered if you implement the error checking correctly. This involves running cudaGetLastError() or cudaPeekAtLastError() after the kernel launch, and inspecting the return value from that, in addition to the other checks you have.

Anyway, invalid device function suggests that you are compiling for an incorrect GPU architecture.

How are you compiling this code, and what GPU do you have?

If you have created a new CUDA project in visual studio, then check the “Device” settings under CUDA properties in the project. There should be some strings like arch=compute_20,code=sm_20 These determine what GPU type you are compiling for.

It’s likely that you have a compute capability 1.x GPU but you did not properly specify the compute settings in Visual Studio, as cuda 6.5 without any switches will default to compiling for cc2.0 devices.

It’s likely that you have a problem similar to what is described here:

[url]c++ - Cuda compilation of examples - Stack Overflow

FlavioWildner · September 20, 2015, 10:04pm

In project properties… Cuda C/C++…Device, change to “compute_11,sm_11” or more recent “compute_12,sm_12”(it’s working in my 8400gs).

[url]https://devtalk.nvidia.com/default/topic/788747/the-kernel-always-returns-values-equal-to-zero/[/url]

Topic		Replies	Views
The kernel always returns values equal to zero CUDA Programming and Performance	10	7979	February 2, 2018
CUDA Programs Returning Zero after Update to v6.5 CUDA Programming and Performance	8	1277	November 20, 2014
Configuration problem or problem with kernel function CUDA Setup and Installation	2	446	October 7, 2017
I am new to cuda programming. In this code, c matric return by GPU is Zero matrix. I tried different... CUDA Programming and Performance	0	443	July 3, 2018
Simple Integer ADD program error Result is always zero CUDA Programming and Performance	2	7893	February 3, 2011
Why am I unable to compile a CUDA program even though I have nvcc? CUDA Setup and Installation	3	601	December 4, 2023
New to CUDA, simple kernel gives output of zero CUDA Programming and Performance	0	7862	April 4, 2010
New to CUDA, simple kernel give output of zero. CUDA Programming and Performance	3	3610	April 4, 2010
Zero output in basic Vector Addition application in CUDA CUDA Programming and Performance	8	4854	January 18, 2011
Issues running CUDA on GPU cluster CUDA Programming and Performance	3	898	February 18, 2017

problem in the program running on CUDA

Related topics