cudaStreamSynchronize vs cudaDeviceSynchronize across threads

rmkeene · November 12, 2021, 7:41pm

I have a C++ program with several threads that use CUDA.
On one thread I am doing custom point cloud stitching.
After a single cycle of stitch, if a correspondence vector has the x component as NaN then it is an invalid correspondence, which is ok.
I then do a cudaStreamSynchronize(nullptr) and dump some debug data to a file.
BUT…
I check if (isnan(CorrespBuffer[i].x))
and it is not NaN, and then the code…

	if (isnan(CorrespBuffer[i].x)) file_out << "Pre is Nan" << std::endl;
	file_out << i << " : [" << (i % CLOUDLETTE_W) << "," << (i / CLOUDLETTE_W) << "] " << CorrespBuffer[i].x << ", " << CorrespBuffer[i].y << ", " << CorrespBuffer[i].z << std::endl;
	if (isnan(CorrespBuffer[i].x)) file_out << "Post is Nan" << std::endl;

I never get Pre is Nan, but occasionally get Post is Nan.
This indicates that CorrespBuffer[i].x changed from a valid number to NaN during the file output?

If I instead do a cudaDeviceSynchronize the strangeness goes away.

I have carefully checked my code for other threads that change CorrespBuffer and there are none.

E.g…
(no Pre is Nan)

27065 : [185,84] 0.139842, 0.0362788, 0.00396447
Post is Nan

Additional thoughts. the memory allocation is done (cudaHostAlloc) is done on the main thread, and processing that I am referring to is done much later on another thread (my StitchWorker thread).
Could this cause a synchronization issue with cudaStreamSynchronize ?

rmkeene · November 12, 2021, 8:27pm

More Info: Interesting, I changed the code to ensure the the same C++ thread (on Windows10, Visual C++) that uses the memory in CUDA also allocates the memory and the bug goes away. CORRECTION: Does not go away.

Is this a bug in CUDA? Is it supposed to work this way? Can the same CUDA block of memory be shared between CUDA streams???

Further Update: Using DeviceSynchronize at least makes the correspondence summing work. It is very intermittent.

rmkeene · November 18, 2021, 2:03am

So the slides at

Explain concurrency and also indicate the problem here. Error between keyboard and chair.

Topic		Replies	Views
Stream synchronization problem didn't synchronize but returned no error CUDA Programming and Performance	0	2810	July 14, 2008
Got wrong result when not using cudaDeviceSynchronize in threads CUDA Programming and Performance	5	928	January 26, 2024
Synchronization problem CUDA Programming and Performance	0	891	December 3, 2012
cudaStreamSynchronize CUDA Programming and Performance	0	1556	July 16, 2009
CudaStreamSynchronize not working properly CUDA Programming and Performance	1	651	November 19, 2022
Unable to synchronize with a specific stream CUDA Programming and Performance	1	7014	May 21, 2011
Do i really need to use cudaDeviceSynchronize in this scenario ? CUDA Programming and Performance	2	1114	February 11, 2019
stream synchronize problem CUDA Programming and Performance	2	809	August 28, 2017
CUDA beginner: understanding the workflow of CUDA kernels and cudaDeviceSynchronize() CUDA Programming and Performance	0	846	November 27, 2017
is there need a streamsynchronize() between kernels and CULA function when use cuda stream? CUDA Programming and Performance	1	485	October 2, 2017

cudaStreamSynchronize vs cudaDeviceSynchronize across threads

Related topics