I’m using nvdec to decode h264 and I noticed that the call to
cuStreamSynchronize() pulls my CPU usage to 100%. I’ve used the
NvDecoder.cpp as a reference where
cuStreamSynchronize() is used as well. When I remove the call to
cuStreamSynchronize() CPU usage drops to 2-7% for a 1280x720 video, which is what I would expect from decoding via a hardware-pipeline.
Do I need to call
cuStreamSynchronize() after mapping/copying/unmapping decoded data into a GL texture? If I need the call to
cuStreamSynchronize() I’m curious why it pulls my CPU usage to 100%.
My goal is to decode using nvdec/cuvid and copy decoded NV12 frames into OpenGL textures. I’m ensuring my video is using NV12 and mapping the decoded frames into two GL textures I create during the initialization phase; I create these textures in my
pfnSequenceCallback. As the
NvDecoder.cpp might not use a full GPU pipeline (e.g. data is copied from GPU > CPU) it might be OK to skip the call to
cuStreamSynchronize() in an implementation that copies decoded frames into GL textures?