Async Kernel launch cpu seems not getting control after kernel launch

Akg · November 25, 2008, 9:21am

Hi all,

A far as i know, the cpu should get the control right after the kernel launch ( i understand that we cannot know when a kernel launch completes ).

My kernel takes almost 1.2 sec to complete processing in gpu. ( i assume it wont take 1.2 sec to launch kernel )

and it seems that the kernel function is only returning after completing the execution.

cutStartTimer(uiKernelTimer);

		dim3 dimBlocksPerGrid(512,16);  

		dim3 dimThreadsPerBlock(512); 

		//RenderFrame

		RenderFrame<<< dimBlocksPerGrid, dimThreadsPerBlock >>>(fpOPFrameGpu, nSlice, nMinrow, nMaxrow - nMinrow  );

		checkCUDAError("Kernel start");

		//cudaThreadSynchronize();

		cutStopTimer( uiKernelTimer );

		printf(" Kernel time %f \n", cutGetTimerValue( uiKernelTimer ));

		cutResetTimer(uiKernelTimer);

here the presence and absence of “cudaThreadSynchronize()” shows the same timing result.

I’m using cuda 1.1

Any help?

Thanks in advance.

E.D_Riedijk · November 25, 2008, 10:53am

Is this a derived version of CUT_CHECK_ERROR?

Because that macro has a CudaThreadSynchronize in it…

Akg · November 25, 2008, 11:32am

void checkCUDAError(const char *msg)

{

	cudaError_t err = cudaGetLastError();

	if( cudaSuccess != err) 

	{

		fprintf(stderr, "Cuda error: %s: %s.\n", msg, cudaGetErrorString( err) );

		exit(-1);

	}						 

}

commenting it won’t make any change.

thanks

Akg · November 27, 2008, 3:27am

no reply… ?? External Image

MisterAnderson42 · November 27, 2008, 1:32pm

Have you enabled profiling or the sync after every kernel launch environment variables? Those will implicitly sync after every kernel launch.

Is RenderFrame the first call you make to any CUDA funtion? If so, then there is an implicit driver/GPU initialization which takes a significant amount of time.

Are you calling this in a loop? Only ~100 async launches can be queued up in recent drivers (16 in older CUDA 1.1 drivers). After that you will get implicit syncs.

Akg · November 28, 2008, 12:00pm

oh. yes… it seems i inadvertently enabled profiling , made it ‘0’

but still it seems to be blocking :( .

the kernel is launched after calling, cudatime, memcopy, and bindtexture fns.

no

thanks

MisterAnderson42 · November 28, 2008, 2:43pm

Sometimes with profiling enabled, it “sticks” on even after you set the variable to 0. Try running the app after a clean boot.

Akg · December 3, 2008, 7:22am

i assumed that… have done a clean boot … but still it blocks there . :S

Topic		Replies	Views
Newbie: async kernel, so I can do stuff on the CPU meanwhile, yeah? CUDA Programming and Performance	2	380	January 13, 2019
Kernel execution blocks CPU code CUDA Programming and Performance	9	3963	September 8, 2009
Running a kernel blocks the CPU? Is it possible to run it asynchronously? CUDA Programming and Performance	2	3495	April 21, 2009
Kernel Timing and cudaThreadSynchronize() CUDA Programming and Performance	6	2016	July 30, 2010
how long is the kernel launch queue these days? CUDA Programming and Performance	4	1845	December 17, 2010
Strange Runtime behavior CUDA Programming and Performance	7	3104	December 18, 2009
Cuda Kernel not launching asynchronously? CUDA Programming and Performance	2	982	June 13, 2012
cudaErrorLaunchTimeout and CUDA2.0 CUDA Programming and Performance	4	2112	July 2, 2008
Kernel Time Execution CUDA Programming and Performance	3	1720	June 5, 2011
No need to check cudaThreadSynchronize() in release mode? CUDA Programming and Performance	9	6342	April 21, 2009

Async Kernel launch cpu seems not getting control after kernel launch

Related topics