I have a series of frames being output from an instance of NVDEC
. Every single frame receives its own cuStream
. Each of those streams end up with several operations enqueued on them:
cuMemAllocFromPoolAsync
– Allocate my own memory for the framecuvidMapVideoFrame64
– Map the framecuMemcpyAsync
– Copy out of the frame into owned memorycuRecordEvent
– Mark that the copy has completed, and the memory may be usedcuvidUnmapVideoFrame
– Unmap the frame
My question has two parts:
Part one:
In some cases, I realize part-of-the-way through processing a frame that I will not use it. Ideally, I’d be able to avoid wasting decoder time, and cuStreamDestroy
the stream to prevent it from executing. I have two fears with this approach, though:
Fear 1: cuvidUnmapVideoFrame
will never be called. Despite the facts that the docs clearly state:
In case the device is still doing work in the stream hStream when cuStreamDestroy() is called, the function will return immediately and the resources associated with hStream will be released automatically once the device has completed all work in hStream.
I am not confident that an NVDEC output surface will be correctly tracked by the stream, as it’s from a discrete API, and the lifetime management is clearly the user’s responsibility. Even if I measure that it does get freed, I’d like to be confident that this is guaranteed behavior, and it will not change in future CUDA or NVDEC releases.
Fear 2: How can I manage the lifetime of the memory created from AllocFromPoolAsync
. Even if I added an event between steps 1 and 2 to verify that the pointer has been allocated, I can never free the memory because the stream may still be processing. If I call cuStreamDestroy
, and then call FreeAsync
, it’s possible that FreeAsync
completes before the cuStreamDestroy
, and then the still-running stream preforms an illegal memory access.
Are these fears well-founded? Are there any synchronizations methods I could employ to mitigate them?
Part two: Priority Inversion?
If a high-priority stream is cuWaitEvent
-ing an event from a low-priority stream, is the priority of the producing stream automatically elevated? Or must I manually set the attribute on the producing stream. Does the stream scheduler even work on small enough time scales to make priority tweaking worthwhile for my usecase?