Hi NVIDIA dev forum admin or video team,
I am seeing weirdness on m4000 nvcuvid performance. I bet this is a sw issue or a hw issue.
This is my summary.
OS : Linux Ubuntu 14.04
GPU : k2200 (gm107) and m4000 (gm204)
Test app: NvTrascoder from nvidia video sdk 6 ( 6.0.1) samples
Test video : a 4k video, 1 min length
Driver version : 352.79 (which is included in cuda sdk 7.5) 351.28
Result:
[k2200]
./NvTranscoder -i /home/***/video/exid-updown_60s.mp4 -o test -deviceID 1
Encoding input : "/home/***/video/exid-updown_60s.mp4"
output : "test"
codec : "H264"
size : 3840x1920
bitrate : 5000000 bits/sec
vbvMaxBitrate : 0 bits/sec
vbvSize : 0 bits
fps : 29 frames/sec
rcMode : CONSTQP
goplength : INFINITE GOP
B frames : 0
QP : 28
preset : LOW_LATENCY_DEFAULT
Total time: 18085.107000ms, Decoded Frames: 1800, Encoded Frames: 1800, Average FPS: 99.529408
[m4000]
./NvTranscoder -i /home/***/video/exid-updown_60s.mp4 -o test -deviceID 0
Encoding input : "/home/***/video/exid-updown_60s.mp4"
output : "test"
codec : "H264"
size : 3840x1920
bitrate : 5000000 bits/sec
vbvMaxBitrate : 0 bits/sec
vbvSize : 0 bits
fps : 29 frames/sec
rcMode : CONSTQP
goplength : INFINITE GOP
B frames : 0
QP : 28
preset : LOW_LATENCY_DEFAULT
Total time: 22155.221000ms, Decoded Frames: 1800, Encoded Frames: 1800, Average FPS: 81.244958
Observations:
-
K2200 outperforms m4000
-
I tested two drivers as stated above → same result
I tested with both gpus on one system, and then each gpu alone on the same system → same result -
Found that nvenc is very cheap op. Decoder dominates the computing time.
-
NvTranscoder source code sets cudaVideoCreate_PreferCUVID to decoder but I got the same result when switching to cudaVideoCreate_PreferCUDA.
-
From this forum I found that cudaVideoCreate_PreferCUDA doesn’t always mean that the app uses cuda kernels. There’s a condition about using cuda instead of VP.
In NvDecodeGL.cpp
void displayHelp()
{
...
printf("\t-decodecuda - Use CUDA kernels for MPEG-2 (Available with 64+ CUDA cores)\n");
printf("\t-decodecuvid - Use NVDEC for MPEG-2, VC-1, H.264, or H.265 decode\n");
...
}
So, in this case, the app seems to use VP(NVDEC).
- The weird thing is that k2200 is better than m4000 if VP is used. m4000’s VP is inferior to k2200’s? or the driver doesn’t control it correctly?
Per wiki Nvidia PureVideo - Wikipedia,
gm107 and gm204 have the same VP6. So, it makes sense the performance is same on either gm107 or gm204.
NVIDIA, could you take a look at this issue?