I have a C++ program with several threads that use CUDA.
On one thread I am doing custom point cloud stitching.
After a single cycle of stitch, if a correspondence vector has the x component as NaN then it is an invalid correspondence, which is ok.
I then do a cudaStreamSynchronize(nullptr) and dump some debug data to a file.
BUT…
I check if (isnan(CorrespBuffer[i].x))
and it is not NaN, and then the code…
if (isnan(CorrespBuffer[i].x)) file_out << "Pre is Nan" << std::endl;
file_out << i << " : [" << (i % CLOUDLETTE_W) << "," << (i / CLOUDLETTE_W) << "] " << CorrespBuffer[i].x << ", " << CorrespBuffer[i].y << ", " << CorrespBuffer[i].z << std::endl;
if (isnan(CorrespBuffer[i].x)) file_out << "Post is Nan" << std::endl;
I never get Pre is Nan, but occasionally get Post is Nan.
This indicates that CorrespBuffer[i].x changed from a valid number to NaN during the file output?
If I instead do a cudaDeviceSynchronize the strangeness goes away.
I have carefully checked my code for other threads that change CorrespBuffer and there are none.
E.g…
(no Pre is Nan)
27065 : [185,84] 0.139842, 0.0362788, 0.00396447
Post is Nan
Additional thoughts. the memory allocation is done (cudaHostAlloc) is done on the main thread, and processing that I am referring to is done much later on another thread (my StitchWorker thread).
Could this cause a synchronization issue with cudaStreamSynchronize ?