Unexpected memory usage during video transcoding on NVIDIA Tesla V100

Hello everyone

I am using opencv+ffmpeg with cuvid support to perform some video transcoding tasks. The basic setup is:

  • nvidia driver: 430.34
  • cuda: 10.1.168
  • opencv: 4.1.0(JavaCV 1.5.1)
  • ffmpeg: 4.1.3
  • cuvid sdk: 9.0

To be specific, a task decodes a h264 rtsp live stream into frames, loads the frames into GpuMat for some furthur operations, then encodes the sequence back into h264 rtmp live stream. On Nvidia Quadro P2000, the overall memory usage for a task is approximately 260M, however, on Nvidia Tesla V100(16G), the usage rockets to over 1GB, with decoding/encoding taking up 730M and GpuMat taking up 320M.

It is weird that such simple task could use gigabytes of GPU memory, which seriously limits the scalability of the transcoding process.
Anyone knows why this is happenning and how can I limit the usage?
Please help me.


I am this problem with P4 using 600/800M and on V100 a processus use 2G memory.
Did you find a way to solve the problem?

For completeness I am posting the same response here as on


NVDEC_VideoDecoder_API_ProgGuide.pdf in NVIDIA Video Codec SDK 9.1 contains a section,

This contains some hints about how to write an application with optimized video memory usage.
Let us know if you find that useful and have any further questions.