For the jetson utilities and inference libraries, when is it recommended to use the cudadevicesynchronize call? Is it always assumed that the user will make these calls directly after calling into the libraries, or is recommended in some circumstances and not others?
@catch22 the use of cudaDeviceSynchronize() in the jetson-inference examples has for the most part been eliminated, but is used in cases where you need to access the results of GPU computation on the CPU, and need to wait for the GPU to complete the processing since the kernels were launched asynchronously.
When gathering profiling data is another example (again, that is for the most part handled internally)
If you are stringing together multiple CUDA kernels in a row, you don’t need it directly after each call, but maybe at the end of the pipeline when you go to use the results on the CPU.
*it’s a better best-practice to use cudaStreamSynchronize() or cudaEventSynchronize() if possible (although I don’t personally have Python wrappers for these, they exist in other libraries like PyCUDA/ect)