if(condition) problem in CUDA

arer90 · March 13, 2019, 2:55am

Hello, i have a question about my cuda programming.
For making accelerating the programming, I made a simple code to cuda programming.
However, i got a problem such as no numbers return except zero. I looked up several times but using a host with my source code worked very well. So i tried to find out the programming in CUDA source but i cannot find a problem.

For making a AoS(array on structure) in CUDA, I made a followin codes.

#include <iostream>
#include <chrono>
#include <cstdlib>

#include <cuda_runtime.h>
#include <device_launch_parameters.h>

#define maxNum 10
#define THREADS 4

typedef struct test {
	float x, y, z;
}points;

__device__ int dmax = -0.05;
__device__ int dmin = 0.066;
__device__ double dradius = 0.2;

__device__
double sqrts(double a) {
	return sqrt(a);
}
__device__
double dv_pow(double a, double b) {
	return (double)pow(a, b);
}

__global__ void function(points *devin, int *dev_count, int size) {
	int idx = threadIdx.x + blockDim.x*blockIdx.x;
	if (idx < size) {
		float xa = devin[idx].x;
		float ya = devin[idx].y;
		float za = devin[idx].z;
		double radius = sqrts((dv_pow((double)xa, 2.0) + dv_pow((double)ya, 2.0)));
		__syncthreads();
		int start = 0;
		printf("phase[%d] = xyz(%f,%f,%f), radius(%lf)\n", idx, devin[idx].x, devin[idx].y, devin[idx].z, radius);
		if ((xa != 0 && ya != 0 && za != 0) && za < dmin || za > dmax || radius >= dradius) {
			start++;
		}
	}
}

int main() {
	int limit = maxNum;
	srand((unsigned)time(NULL));
	points *alpha;
	alpha = new points[limit];
	for (int i = 0; i < limit; i++) {
		/*
		0.006819	0.008099	-0.034720
		0.005938	0.007701	-0.035073
		0.005802	0.007906	-0.035066
		0.005667	0.008112	-0.035059
		0.005532	0.008317	-0.035052
		0.005396	0.008523	-0.035045
		0.005243	0.008714	-0.035046
		0.005096	0.008910	-0.035045
		0.004949	0.009106	-0.035044
		0.004801	0.009302	-0.035042
		*/
		alpha[i].x = 0.006819;
		alpha[i].y = 0.008099;
		alpha[i].z = -0.034720;
	}

	points *dev_alpha;
	int counts=0, *dev_count;
	cudaError(cudaMalloc((void**)&dev_alpha, sizeof(points)*limit));
	cudaError(cudaMalloc((void**)&dev_count, sizeof(int)));

	cudaError(cudaMemcpy(dev_alpha, alpha, sizeof(points)*limit, cudaMemcpyHostToDevice));
	cudaError(cudaMemcpy(dev_count, &counts, sizeof(int), cudaMemcpyHostToDevice));

	dim3 threads(THREADS);
	dim3 blocks((limit + THREADS - 1) / THREADS);

	function<<<blocks, threads>>>(dev_alpha, dev_count, limit);

	cudaError(cudaMemcpy(&counts, dev_count, sizeof(int), cudaMemcpyDeviceToHost));

	printf("result = %d\n", counts);

	cudaError(cudaFree(dev_alpha));
	cudaError(cudaFree(dev_count));
	delete[] alpha;

	return 0;
}

This is my main function.

please help me to find a problem.
Thank you!

ps. I will attach the result images.

Robert_Crovella · March 13, 2019, 3:25am

please provide a complete code, not bits and pieces.

what are maxNum and THREADS ?

Robert_Crovella · March 13, 2019, 3:30am

Your output variable is count:

__global__ void function(points *devin, int *count, int size) 
                                        ^^^^^^^^^^

Your kernel code doesn’t write to count anywhere

so the result printout is always going to be zero:

int counts=0, *dev_count;
        ...
	cudaError(cudaMemcpy(dev_count, &counts, sizeof(int), cudaMemcpyHostToDevice));
        ...
        function<<<blocks, threads>>>(dev_alpha, dev_count, limit);

	cudaError(cudaMemcpy(&counts, dev_count, sizeof(int), cudaMemcpyDeviceToHost));

	printf("result = %d\n", counts);

If you add this line at the end of your kernel:

if ((xa != 0 && ya != 0 && za != 0) && za < dmin || za > dmax || radius >= dradius) {
                        start++;
                }
                atomicAdd(count, start); // add this line
        }
}

You may get some non-zero result.

arer90 · March 13, 2019, 3:33am

I am so sorry about not giving a complete code.
I revised codes to completed one.

Robert_Crovella · March 13, 2019, 3:36am

You’ve changed the code. Anyway my previous answer shows what the problem is, although you’ve changed variable names. (change count in my answer to dev_count to match your new code)

arer90 · March 13, 2019, 3:40am

Ops, I made a different one. But thank you for giving me advice!
I solved problem with your help!
Thank you!^^

Topic		Replies	Views
Density map Getting mad with cuda CUDA Programming and Performance	12	7789	July 1, 2008
This code doesn't work maybe too much threads assigned? CUDA Programming and Performance	8	1103	February 2, 2014
New to CUDA, simple kernel gives output of zero CUDA Programming and Performance	0	7865	April 4, 2010
problem about cuda coding CUDA Programming and Performance	2	545	April 14, 2013
CUDA problem when program runs on device CUDA Programming and Performance	6	1585	August 17, 2009
a problem about parallel programming in CUDA Jetson TK1	2	716	October 18, 2021
How can this be possible? CUDA Programming and Performance	0	488	June 1, 2014
Why Do I Always Get Zero From My Code CUDA Programming and Performance	2	777	November 4, 2013
Losing CUDA calculatons CUDA Programming and Performance	5	2325	March 21, 2011
New to CUDA, simple kernel give output of zero. CUDA Programming and Performance	3	3684	April 4, 2010

if(condition) problem in CUDA

Related topics