Computing with CUDA and its kernel configurations Some problems with the configuration

lnxsurf · February 1, 2010, 2:26pm

Hello everybody…

I am still a newbie with the CUDA programming and I use it for my master work. So I got some experience with it and I think sometimes I love it and sometimes I hate it. Actually I have discovered a strange behavior from CUDA. I checked my computations with different configurations and there are some things which I cannot understand. For example: I put for the configuration of the kernel two blocks with 512 threads in it and the computation fail. With another

configuration like two blocks and with 300 threads it works fine or 1024 blocks with one thread in it works fine, too. So maybe I didn’t understand something…

Here is the code:

__device__ 

float func(const float x, 

			const float x0, const float x1, 

			const float b0, const float b1, const float a)

{

	using namespace std;

	float h = x1 - x0;

	float tmp = 1.f / (2.f*h);

	return tmp*b1*pow(x-x0, 2.f) - tmp*b0*pow(x1-x, 2.f) + a;

}

__global__ 

void kernelSpline(float2 *ret, const int resN, float *cfA, float *cfB,

						const float2 *iPts, const int baseN)

{

	int id = blockIdx.x * blockDim.x + threadIdx.x;

	int threads = blockDim.x * gridDim.x;

	float2 pmin, pmax;

	float2 p0, p1, pt;

	

	pmin = iPts[0];	pmax = iPts[baseN-1];

	float t = (pmax.x - pmin.x) / resN;

	for(int i=id; i<resN; i+=threads) {

		pt.x = pmin.x + i*t;

		int index = findInterval(p0, p1, pt.x, iPts, baseN);

		pt.y = func(pt.x, p0.x, p1.x, cfB[index], cfB[index+1], cfA[index+1]);

		ret[i] = pt;

	}

}

I am calling the kernel from host code with the specific configuration and from the kernel the device function is called. My idea is that every thread is computing one element. Maybe there is a mistake in the compilation? Please help me to understand the error…

Thanks in advance.

Topic		Replies	Views
Here are my timing results, not impressive. Help. CUDA Programming and Performance	5	7010	January 30, 2008
Run a million threads or blocks on a single kernel function, and still works. It supposed to be 512 at maximum, isn't it? CUDA Programming and Performance	4	1308	January 6, 2017
Invalid configuration argument Kernels fail to work with big arrays CUDA Programming and Performance	2	9596	October 6, 2008
syncthreads problem I guess this is a syncthreads problem CUDA Programming and Performance	9	5130	October 12, 2008
I wonder maximum number of threads per block really limits the number of threads in each block. CUDA Programming and Performance	5	3978	February 9, 2024
help to clairfy usage of number of grids and number of blocks in kernal CUDA Programming and Performance	0	611	February 14, 2014
First CUDA trial but unexpected behavior CUDA Programming and Performance	4	1003	March 29, 2012
CUDA kernels keep on crashing CUDA Programming and Performance	6	3644	October 27, 2008
Limitations of a CUDA kernel reached? CUDA Programming and Performance	3	4326	March 7, 2011
cannot resolve the error in running multi-block, mutli-threads kernel CUDA Programming and Performance	5	1062	February 5, 2014

Computing with CUDA and its kernel configurations Some problems with the configuration

Related topics