cuStreamSynchronize() pulls cpu usage to 100%

I’m using nvdec to decode h264 and I noticed that the call to cuStreamSynchronize() pulls my CPU usage to 100%. I’ve used the NvDecoder.cpp as a reference where cuStreamSynchronize() is used as well. When I remove the call to cuStreamSynchronize() CPU usage drops to 2-7% for a 1280x720 video, which is what I would expect from decoding via a hardware-pipeline.

Do I need to call cuStreamSynchronize() after mapping/copying/unmapping decoded data into a GL texture? If I need the call to cuStreamSynchronize() I’m curious why it pulls my CPU usage to 100%.

My goal is to decode using nvdec/cuvid and copy decoded NV12 frames into OpenGL textures. I’m ensuring my video is using NV12 and mapping the decoded frames into two GL textures I create during the initialization phase; I create these textures in my pfnSequenceCallback. As the NvDecoder.cpp might not use a full GPU pipeline (e.g. data is copied from GPU > CPU) it might be OK to skip the call to cuStreamSynchronize() in an implementation that copies decoded frames into GL textures?