kernell calls inside a loop is it ok?

Genesis_39 · December 3, 2010, 9:24pm

So is it OK to make a loop and call a kernel in it like

for(int i = 0 ; i < n ; i++)

{

DoSomeWork<<<dim3,dim3>>>(someargs,i);

}

I ask this by 2 reasons :

performance
I try to do the same in my prog right now and it fails in a very strange way.

So the situation is: I have a kernel that accepts 7 arguments, all the arguments are type of int except one which is resulting array type of int*

I allocate a lot of memory for that array and then start a loop in which I launch my kernel (I’m using loop variable as an argument for the kernel)

And what is REALLY STRANGE - the first kernel lauch is successful, all the others result in “invalid argument”, though I don’t even reallocate or anyhow change pointer for resulting array. Besides I can launch empty kernel and the result will be the same. By the way I’ve made a similar test app and it works =|

Can anyone help?

This is what doesn’t work (all libs are included and so on)

//I've changed some defined values and variables for exact numbers

__global__ void Test(int* points)

{

	//it can be either completely empty - anyhow doesn't work

	points[blockIdx.x + threadIdx.x] = 1;

}

main( )

{

	...

	int* d_points;

	cudaMalloc( (void**)&d_points , 1024*1024*10);

	...

	for (int i = 0 ; i < 1024; i++)

	{

		Test<<<1,1>>>(d_points);

		CUDA_CHECK_ERROR(cudaThreadSynchronize());

		

		//was made to check results after the first iteration

		cudaMemcpy(points, d_points, 1024*1024*10, cudaMemcpyDeviceToHost);

		//because memcpy seems to be async operation

		CUDA_CHECK_ERROR(cudaThreadSynchronize());

		cudaError_t err = cudaGetLastError();

		if( cudaSuccess != err) 

		{

			fprintf(stderr, "Cuda error: %s: %s.\n", msg, cudaGetErrorString( err) );

		}

	}

	...

}

This is simple which works

__global__ void DoCuda(int* points,int offset)

{

	points[threadIdx.x + offset*5] = threadIdx.x;

}

int main(int argc, char* argv[])

{

	int* h_points = (int*)malloc(500);

	int* d_points = NULL;

	cudaMalloc( (void**)&d_points,500 );

	for(int i = 0 ; i < 25 ; i++)

	{

		DoCuda<<<1,5>>>(d_points,i);

		cudaError_t err = cudaGetLastError();

		if (err != cudaSuccess)

		{

			printf("%s",cudaGetErrorString(err));

		}

	}

	cudaMemcpy( h_points, d_points, 500, cudaMemcpyDeviceToHost );

	cudaError_t err = cudaGetLastError();

	for (int i = 0 ; i < 125 ; i++)

	{

		h_points[i]++;

	}

	return 0;

}

tera · December 3, 2010, 10:19pm

Yes, that would be ok.

After replacing [font=“Courier New”]500[/font] with [font=“Courier New”]125*sizeof(int)[/font] and [font=“Courier New”]printf(“%s”,err);[/font] with [font=“Courier New”]printf(“%s”,cudaGetErrorString(err));[/font] and adding [font=“Courier New”]#include <stdio.h>[/font] on top your program ran fine for me.

Genesis_39 · December 3, 2010, 10:30pm

sizeof(int) is 4 bytes so 125*sizeof(int) is the same as 500…(yes I could use sizeof(int) to make it more clear but 500 is shorter than sizeof(int) and will be good for just a simple test app)

About the second - I’m watching until the first error in debugger, but that doesn’t really matter. I’ll change it not to confuse people.

All libs are included, it compiles and runs, the matter are described problems.

Genesis_39 · December 4, 2010, 4:36pm

OK,
I’ve found error. It’s interesting - it was related with the texture I’ve tried to allocate.
So I’ve tried to allocate texture as big as half memory of my gpu memory (256 mb).
And the maximum texture size you can find here http://forums.nvidia.com/index.php?showtopic=30934 , it’s 2^27 * sizeof(texturedata)
The most interesting fact is that there’s no error allocating, binding and so on and even the first kernel run is ok, the error appears just on the second kernel call.

Topic		Replies	Views
calling cuda kernels in a loop CUDA Programming and Performance	1	1685	June 16, 2009
Kernal function in a loop. is it fine? CUDA Programming and Performance	6	1828	May 12, 2009
For-Loop is not executed CUDA Programming and Performance	5	1286	December 5, 2012
for loop inside kernel CUDA Programming and Performance	2	5371	September 12, 2011
Loop inside kernel CUDA Programming and Performance	7	1099	November 16, 2010
kernel in loop (time explodes) CUDA Programming and Performance	4	3478	June 29, 2009
cuda kernel call within for loop gets slow, crashes CUDA Programming and Performance	5	5800	April 1, 2012
kernel execution in FOR loops CUDA Programming and Performance	8	5727	January 9, 2010
Calling kernel in a loop spends much time in cudaFree CUDA Programming and Performance	1	762	July 16, 2018
threads violating for loop CUDA Programming and Performance	7	540	June 11, 2019

kernell calls inside a loop is it ok?

Related topics