cudaThreadSynchronize vs. cudaDeviceSynchronize what is the difference?

blaxwan · June 23, 2011, 10:54pm

What is the difference between cudaThreadSynchronize and cudaDeviceSynchronize? It seem like a lot of example programs use cudaThreadSynchroniz. But recent NVidia documentation says

So, I used cudaDeviceSynchronize in my program. After seeing that it crashes, I switched to cudaThreadSynchronize and now it doesn’t crash.

Is anybody here less befuddled than I?

tmurray · June 23, 2011, 10:55pm

post some source?

blaxwan · June 23, 2011, 11:50pm

Don’t know if this much of a snippet will shed much light, but here goes

float maxVal=0;

	size_t index = cublasIsamax( arrayLen*nMagArrays, amps, 1 )-1;

 	GpuToHost( amps+index, &maxVal, 1 );

 	cudaDeviceSynchronize(); // <- the questionable line

GpuToHost is a function i wrote, which goes like this:

inline void GpuToHost( const float * devPtr, float * hostPtr, size_t len )

{

	cudaError_t errCode = cudaMemcpy( hostPtr, devPtr, len*sizeof(float), cudaMemcpyDeviceToHost );

	if( errCode == cudaErrorInvalidDevicePointer )

		throw GpuException( "cudaErrorInvalidDevicePointer error in GpuToHost" );

	else if( errCode != cudaSuccess )

		throw GpuException( "error in GpuToHost" );

}

Anyway, with no explicit synchronization line, I get occasional bad results. With cudaThreadSynchronize, I get reliable results. With cudaDeviceSynchronize, it crashes later (eventually I get an error from GpuToHost()).

This is all with toolkit 4.0 and a recent driver installed, using a 560 gtx ti. On other machines, with older cards, toolkits, and drivers, I get along fine without any explicit sync.

blaxwan · June 25, 2011, 12:30am

I guess this was a tougher question than I thought. Setting aside the issue of why my program crashed, does anybody know what is the intended difference between these two CUDA calls? The documentation is not clear on this point.

Also, I have seen pages that say that cudaMemcpy from device to host is always synchronous. Yet, I think I’ve seen other pages that say it may be asynchronous below a certain transfer size. Which one is true?

hyqneuron · June 25, 2011, 6:04am

Doesn’t the programming guide say that cudaMemcpy of sizes below 64K are asynchronous by default?

tmurray · June 26, 2011, 5:09pm

DtoH cudaMemcpys are always synchronous. HtoD cudaMemcpys will return once the source buffer can be modified without impacting the copy.

Raistmer · June 26, 2011, 8:05pm

Could nVidia make it more clear then in next revision?

In documentation “host ↔ device” line presents and “<->” sign meaning in many common cases is “both directions”.

If nVidia can’t use words instead of signs could it use sign “->” at least?

Topic		Replies	Views
cudaDeviceSynchronize - blocks only GPU for the host (CPU) thread in which it is called, or does it CUDA Programming and Performance	3	4237	January 12, 2014
cudaDeviceSynchronize() CUDA Programming and Performance	1	3200	September 21, 2017
Synchronization synchronizing a n body problem. CUDA Programming and Performance	8	4391	September 22, 2009
cudaDeviceSynchronize from device code is deprecated CUDA Programming and Performance	15	7535	March 18, 2024
cudaDeviceSynchronize() doesn't wait for cudaMemcpy to finish? CUDA Programming and Performance cuda , synchronization	3	3084	February 17, 2021
CUDA beginner: understanding the workflow of CUDA kernels and cudaDeviceSynchronize() CUDA Programming and Performance	0	827	November 27, 2017
cudaDeviceSynchronize blocking effect cudaDeviceScheduleBlockingSync CUDA Programming and Performance	3	6720	June 30, 2012
cudaThreadSyncronize doubt CUDA Programming and Performance	2	1700	September 1, 2008
Asyncronus call CUDA Programming and Performance	1	2307	September 24, 2009
cudaThreadSynchronize usage CUDA Programming and Performance	3	2972	October 21, 2008

cudaThreadSynchronize vs. cudaDeviceSynchronize what is the difference?

Related topics