Dear NVIDIA experts,
We have implemented GPU hardware accelerated h.264 video decoding functionality via NVCUVID API on NVIDIA P4 card (and 1080 card) which has a PASCAL architecture.
Actually this implementation is almost the copy of codes from $CUDASAMPLE\v8.0\3_Imaging\cudaDecodeD3D9 project, except that we replace videoSource part with our own video demuxer which is responsible for getting H.264 NALU bitstream from video file.
Now we want to measure the performance of this GPU hardware h.264 decoder. For this, two experiments are executed:
– Experiment 1.
Input a video file with resolution of 1920 x 1080 and run one decoder concurrently. The benchmark data are:
- decoder’s fps: 650fps ~ 850fps ( Note that, speed varies on different video source)
- GPU memory used by this decoder: 260M Bytes
– Experiment 2.
Input 4 video files and run 4 decoders concurrently. Note that, each decoder is responsible for one video file’s decoding. Actually, those 4 videos are all the same copy of a H.264 video file with resolution of 1920x1080. The benchmark data are:
- 4 decoders’ fps summation: 650fps ~ 850fps ( Note that, speed varies on different video source )
- GPU memory used by these 4 decoders: (260M * 4) Bytes
We have read Nvidia Video Codec SDK Application Notes, and found that the speed measured by us is roughly similar with the speed mentioned in that application notes, but there are no any informative data related to GPU memory usage in that notes and any official NVIDIA webpage.
As per our experiments, it seems that each video decoder instance will take up ~260M GPU memory for 1920x1080 video. Frankly this account is much higher than our expectation since according to our experience H264 video decoder will not take up more than 50M bytes for 1920x1080 video in most cases.
We ever planed to use one GPU card to decode concurrently 20 channels’ HD living H264 streams via 20 decoder instances which are working concurrently, and execute some other image processing tasks, but now it seems that memory is a big challenge, since decoder takes up too much memory.
Now we just want to know whether 260M GPU memory usage for one GPU h264 HD video decoder we measured is normal. Is there any reference info related to memory usage for GPU accelerated video decoder, especially H264 baseline and main profile decoder?
Any information is appreciated!
br,
zxjan