Hi,
For further check, we would need your help to share a test code(better to be based on 02_video_dec_cuda) for replicating the issue. So that we can check further.
Another solution is to use cairo APIs to put tandext and then call NvBufferMemSyncForDevice(). May refer to
Tx2-4g r32.3.1 nvivafilter performance - #16 by DaneLLL
Can get CPU pointer to NvBuffer by calling NvBufferMemMap()