multiple kernel calls from one host function strange behaviour when calling kernel

huki · April 21, 2011, 6:33pm

Hi everyone,

recently i tried to optimize my working code, and instead calling

void func(pType * p){

...

        pType *p1;

	cudaMalloc((void**)&p1, MN*sizeof(pType));

	for (int i = 0; i < JACOBI_ITERATIONS; i++)

	{

		

		jacobi_shared3<<<grid, block>>>(p1, p, rhs, dx, dy, alpha, beta, pitch);   //stores results to first argument

		cudaThreadSynchronize();

		cudaMemcpy(p, p1, sizeof(pType)*dx*dy, cudaMemcpyDeviceToDevice);

		

	}

	cudaFree(p1);

}

i tried something like this:

void func(pType * p){

...

        pType *p1;

	cudaMalloc((void**)&p1, MN*sizeof(pType));

	for (int i = 0; i < JACOBI_ITERATIONS/2; i++)

	{

		

		jacobi_shared3<<<grid, block>>>(p1, p, rhs, dx, dy, alpha, beta, pitch);      //stores results to first argument

		cudaThreadSynchronize();

		

		jacobi_shared3<<<grid, block>>>(p, p1, rhs, dx, dy, alpha, beta, pitch);

		cudaThreadSynchronize();

		

	}

	cudaFree(p1);

}

The problem is, I get a cudaError=30 (checked with cudeGetLastError()) when calling second cudaThreadSynchronize().

Why is that so? What am I doing wrong?

And will this change affect calculations performance?

I have a GTX275 with CUDA 3.2 installed.

Any help will be appreciated.

tera · April 21, 2011, 6:48pm

Where do you allocate memory for [font=“Courier New”]*p[/font] in the second version?

huki · April 21, 2011, 6:56pm

Oh, sorry, its allocated somewhere else by cudaMalloc and passed as a parameter just like in the first version. I just posted it incorrectly, but its fixed now.

tera · April 21, 2011, 7:09pm

Then I don’t know, as long as you allocate enough memory for [font=“Courier New”]*p[/font].

It looks a bit like you have an out-of-bounds access somewhere in the kernel.

huki · April 21, 2011, 7:13pm

its declared on a global scope as:

pType *pField = NULL; //pressure

and allocated:

cudaMalloc((void**) &pField, sizeof(pType) * MN);

Topic		Replies	Views
incomprehensible behaviour limitations on kernel calls for host function? CUDA Programming and Performance	12	7155	April 28, 2011
strange behavior of kernel-calls CUDA Programming and Performance	2	2486	December 4, 2008
Program hangs at cudaThreadsynchronize CUDA Programming and Performance	12	9716	April 7, 2011
KERNELS are NOT queing , bug in cuda 2.0 ? cudathreadsynchronize() makes no difference ? CUDA Programming and Performance	12	5443	August 17, 2009
kernel printf strange behaviour of printf in __global__ sub CUDA Programming and Performance	1	3972	February 22, 2011
Calling CUDA kernels from 2 or more CPU threads simultaleously gives “unknown error” CUDA Programming and Performance	0	712	April 17, 2013
Looping kernel calls Unspecified launch error on cudaFree() ?? CUDA Programming and Performance	5	1793	May 13, 2009
synchronous kernel calls? CUDA Programming and Performance	5	11231	April 28, 2011
Synchronization synchronizing a n body problem. CUDA Programming and Performance	8	4397	September 22, 2009
Strange Behaviour for multiple kernel calls CUDA Programming and Performance	11	7032	November 18, 2010

multiple kernel calls from one host function strange behaviour when calling kernel

Related topics