Async Kernel launch cpu seems not getting control after kernel launch

Hi all,

A far as i know, the cpu should get the control right after the kernel launch ( i understand that we cannot know when a kernel launch completes ).

My kernel takes almost 1.2 sec to complete processing in gpu. ( i assume it wont take 1.2 sec to launch kernel )

and it seems that the kernel function is only returning after completing the execution.

cutStartTimer(uiKernelTimer);

		dim3 dimBlocksPerGrid(512,16);  

		dim3 dimThreadsPerBlock(512); 

		//RenderFrame

		RenderFrame<<< dimBlocksPerGrid, dimThreadsPerBlock >>>(fpOPFrameGpu, nSlice, nMinrow, nMaxrow - nMinrow  );

		checkCUDAError("Kernel start");

		//cudaThreadSynchronize();

		cutStopTimer( uiKernelTimer );

		printf(" Kernel time %f \n", cutGetTimerValue( uiKernelTimer ));

		cutResetTimer(uiKernelTimer);

here the presence and absence of “cudaThreadSynchronize()” shows the same timing result.

I’m using cuda 1.1

Any help?

Thanks in advance.

Is this a derived version of CUT_CHECK_ERROR?

Because that macro has a CudaThreadSynchronize in it…

void checkCUDAError(const char *msg)

{

	cudaError_t err = cudaGetLastError();

	if( cudaSuccess != err) 

	{

		fprintf(stderr, "Cuda error: %s: %s.\n", msg, cudaGetErrorString( err) );

		exit(-1);

	}						 

}

commenting it won’t make any change.

thanks

no reply… ?? :shock:

Have you enabled profiling or the sync after every kernel launch environment variables? Those will implicitly sync after every kernel launch.

Is RenderFrame the first call you make to any CUDA funtion? If so, then there is an implicit driver/GPU initialization which takes a significant amount of time.

Are you calling this in a loop? Only ~100 async launches can be queued up in recent drivers (16 in older CUDA 1.1 drivers). After that you will get implicit syncs.

oh. yes… it seems i inadvertently enabled profiling , made it ‘0’

but still it seems to be blocking :( .

the kernel is launched after calling, cudatime, memcopy, and bindtexture fns.

no

thanks

Sometimes with profiling enabled, it “sticks” on even after you set the variable to 0. Try running the app after a clean boot.

i assumed that… have done a clean boot … but still it blocks there . :S