I’m measuring perfomance using the sample 3_Imaging/cudaDecodeGl.
The results are these:
The video used in the test is a full hd h264 mp4
$ ffmpeg -i /mnt/A/TS/video.mp4
ffmpeg version 1.0.6_patch-aac-resample-lock Copyright (c) 2000-2013 the FFmpeg developers
built on Jun 2 2015 17:00:42 with gcc 4.8.3 (GCC) 20140911 (Red Hat 4.8.3-9)
configuration: --enable-libvpx --enable-shared --prefix=/usr --enable-libtheora --enable-postproc --enable-gpl --enable-libmp3lame --enable-libvorbis --enable-libx264 --enable-libfdk_aac --enable-nonfree --libdir=/usr/lib64 --shlibdir=/usr/lib64
libavutil 51. 73.101 / 51. 73.101
libavcodec 54. 59.100 / 54. 59.100
libavformat 54. 29.104 / 54. 29.104
libavdevice 54. 2.101 / 54. 2.101
libavfilter 3. 17.100 / 3. 17.100
libswscale 2. 1.101 / 2. 1.101
libswresample 0. 15.100 / 0. 15.100
libpostproc 52. 0.100 / 52. 0.100
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x72c240] multiple edit list entries, a/v desync might occur, patch welcome
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/mnt/A/TS/video.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf54.29.104
Duration: 00:00:28.66, start: 1.533000, bitrate: 13610 kb/s
Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1920x1080 [SAR 1:1 DAR 16:9], 13607 kb/s, 29.97 fps, 29.97 tbr, 90k tbn, 59.94 tbc
Metadata:
handler_name : VideoHandler
GeForce GTX 780
Using preferCudacuda:
$ ./cudaDecodeGL -nointerop -decodecuda -device=0 /mnt/A/TS/video.mp4
[CUDA/OpenGL Video Decode]
Command Line Arguments:
argv[0] = ./cudaDecodeGL
argv[1] = -nointerop
argv[2] = -decodecuda
argv[3] = -device=0
argv[4] = /mnt/A/TS/video.mp4
[cudaDecodeGL]: input file: </mnt/A/TS/video.mp4>
VideoCodec : AVC/H.264
Frame rate : 30000/1001fps ~ 29.97fps
Sequence format : Interlaced
Coded frame size: [1920, 1088]
Display area : [0, 0, 1920, 1080]
Chroma format : 4:2:0
Bitrate : unknown
Aspect ratio : 16:9
argv[0] = ./cudaDecodeGL
argv[1] = -nointerop
argv[2] = -decodecuda
argv[3] = -device=0
argv[4] = /mnt/A/TS/video.mp4
gpuDeviceInitDRV() Using CUDA Device [0]: GeForce GTX 780 Ti
gpuDeviceInitDRV() Using CUDA Device [0]: GeForce GTX 780 Ti
> Using GPU Device: GeForce GTX 780 Ti has SM 3.5 compute capability
Total amount of global memory: 3071.3125 MB
>> modInitCTX<NV12ToARGB_drvapi64.ptx > initialized OK
>> modGetCudaFunction< CUDA file: NV12ToARGB_drvapi64.ptx >
CUDA Kernel Function (0x01dd64b0) = < NV12ToARGB_drvapi >
>> modGetCudaFunction< CUDA file: NV12ToARGB_drvapi64.ptx >
CUDA Kernel Function (0x01ddd540) = < Passthru_drvapi >
Free memory: 2847.5508 MB
> VideoDecoder::cudaVideoCreateFlags = <1>Use CUDA decoder
[cudaDecodeGL] - [Field: 0016, 00.0 fps, frame time: 90032283648.00 (ms) ]
[cudaDecodeGL] - [Field: 0032, 136.3 fps, frame time: 7.34 (ms) ]
[cudaDecodeGL] - [Field: 0048, 136.3 fps, frame time: 7.33 (ms) ]
[cudaDecodeGL] - [Field: 0064, 137.0 fps, frame time: 7.30 (ms) ]
[cudaDecodeGL] - [Field: 0080, 138.2 fps, frame time: 7.24 (ms) ]
[cudaDecodeGL] - [Field: 0096, 134.5 fps, frame time: 7.44 (ms) ]
[cudaDecodeGL] - [Field: 0112, 133.7 fps, frame time: 7.48 (ms) ]
[cudaDecodeGL] - [Field: 0128, 132.5 fps, frame time: 7.55 (ms) ]
[cudaDecodeGL] - [Field: 0144, 134.9 fps, frame time: 7.41 (ms) ]
[cudaDecodeGL] - [Field: 0160, 133.0 fps, frame time: 7.52 (ms) ]
[cudaDecodeGL] - [Field: 0176, 137.1 fps, frame time: 7.29 (ms) ]
[cudaDecodeGL] - [Field: 0192, 141.6 fps, frame time: 7.06 (ms) ]
[cudaDecodeGL] - [Field: 0208, 137.7 fps, frame time: 7.26 (ms) ]
[cudaDecodeGL] - [Field: 0224, 139.0 fps, frame time: 7.19 (ms) ]
[cudaDecodeGL] - [Field: 0240, 133.9 fps, frame time: 7.47 (ms) ]
[cudaDecodeGL] - [Field: 0256, 136.6 fps, frame time: 7.32 (ms) ]
[cudaDecodeGL] - [Field: 0272, 138.7 fps, frame time: 7.21 (ms) ]
[cudaDecodeGL] - [Field: 0288, 135.7 fps, frame time: 7.37 (ms) ]
[cudaDecodeGL] - [Field: 0304, 136.8 fps, frame time: 7.31 (ms) ]
[cudaDecodeGL] - [Field: 0320, 140.1 fps, frame time: 7.14 (ms) ]
[cudaDecodeGL] - [Field: 0336, 157.4 fps, frame time: 6.35 (ms) ]
[cudaDecodeGL] - [Field: 0352, 133.9 fps, frame time: 7.47 (ms) ]
[cudaDecodeGL] - [Field: 0368, 133.3 fps, frame time: 7.50 (ms) ]
[cudaDecodeGL] - [Field: 0384, 134.0 fps, frame time: 7.46 (ms) ]
[cudaDecodeGL] - [Field: 0400, 133.7 fps, frame time: 7.48 (ms) ]
[cudaDecodeGL] - [Field: 0416, 134.1 fps, frame time: 7.46 (ms) ]
[cudaDecodeGL] - [Field: 0432, 133.3 fps, frame time: 7.50 (ms) ]
[cudaDecodeGL] - [Field: 0448, 136.5 fps, frame time: 7.32 (ms) ]
[cudaDecodeGL] - [Field: 0464, 134.8 fps, frame time: 7.42 (ms) ]
[cudaDecodeGL] - [Field: 0480, 139.5 fps, frame time: 7.17 (ms) ]
[cudaDecodeGL] - [Field: 0496, 141.4 fps, frame time: 7.07 (ms) ]
[cudaDecodeGL] - [Field: 0512, 139.8 fps, frame time: 7.15 (ms) ]
[cudaDecodeGL] - [Field: 0528, 139.2 fps, frame time: 7.18 (ms) ]
[cudaDecodeGL] - [Field: 0544, 138.3 fps, frame time: 7.23 (ms) ]
[cudaDecodeGL] - [Field: 0560, 131.3 fps, frame time: 7.62 (ms) ]
[cudaDecodeGL] - [Field: 0576, 146.5 fps, frame time: 6.83 (ms) ]
[cudaDecodeGL] - [Field: 0592, 144.2 fps, frame time: 6.94 (ms) ]
[cudaDecodeGL] - [Field: 0608, 138.4 fps, frame time: 7.22 (ms) ]
[cudaDecodeGL] - [Field: 0624, 155.3 fps, frame time: 6.44 (ms) ]
[cudaDecodeGL] - [Field: 0640, 145.9 fps, frame time: 6.85 (ms) ]
[cudaDecodeGL] - [Field: 0656, 146.6 fps, frame time: 6.82 (ms) ]
[cudaDecodeGL] - [Field: 0672, 143.7 fps, frame time: 6.96 (ms) ]
[cudaDecodeGL] - [Field: 0688, 141.3 fps, frame time: 7.07 (ms) ]
[cudaDecodeGL] - [Field: 0704, 141.2 fps, frame time: 7.08 (ms) ]
[cudaDecodeGL] - [Field: 0720, 144.1 fps, frame time: 6.94 (ms) ]
[cudaDecodeGL] - [Field: 0736, 145.9 fps, frame time: 6.85 (ms) ]
[cudaDecodeGL] - [Field: 0752, 145.1 fps, frame time: 6.89 (ms) ]
[cudaDecodeGL] - [Field: 0768, 146.2 fps, frame time: 6.84 (ms) ]
[cudaDecodeGL] - [Field: 0784, 147.2 fps, frame time: 6.79 (ms) ]
[cudaDecodeGL] - [Field: 0800, 140.1 fps, frame time: 7.14 (ms) ]
[cudaDecodeGL] - [Field: 0816, 141.9 fps, frame time: 7.05 (ms) ]
[cudaDecodeGL] - [Field: 0832, 125.2 fps, frame time: 7.99 (ms) ]
[cudaDecodeGL] - [Field: 0848, 130.5 fps, frame time: 7.66 (ms) ]
[cudaDecodeGL] statistics
Video Length (hh:mm:ss.msec) = 00:00:06.173
Frames Presented (inc repeats) = 856
Average Present Rate (fps) = 138.65
Frames Decoded (hardware) = 1712
Average Rate of Decoding (fps) = 277.30
Using preferCuvid
$ ./cudaDecodeGL -nointerop -decodecuvid -device=0 /mnt/A/TS/video.mp4
[CUDA/OpenGL Video Decode]
Command Line Arguments:
argv[0] = ./cudaDecodeGL
argv[1] = -nointerop
argv[2] = -decodecuvid
argv[3] = -device=0
argv[4] = /mnt/A/TS/video.mp4
[cudaDecodeGL]: input file: </mnt/A/TS/video.mp4>
VideoCodec : AVC/H.264
Frame rate : 30000/1001fps ~ 29.97fps
Sequence format : Interlaced
Coded frame size: [1920, 1088]
Display area : [0, 0, 1920, 1080]
Chroma format : 4:2:0
Bitrate : unknown
Aspect ratio : 16:9
argv[0] = ./cudaDecodeGL
argv[1] = -nointerop
argv[2] = -decodecuvid
argv[3] = -device=0
argv[4] = /mnt/A/TS/video.mp4
gpuDeviceInitDRV() Using CUDA Device [0]: GeForce GTX 780 Ti
gpuDeviceInitDRV() Using CUDA Device [0]: GeForce GTX 780 Ti
> Using GPU Device: GeForce GTX 780 Ti has SM 3.5 compute capability
Total amount of global memory: 3071.3125 MB
>> modInitCTX<NV12ToARGB_drvapi64.ptx > initialized OK
>> modGetCudaFunction< CUDA file: NV12ToARGB_drvapi64.ptx >
CUDA Kernel Function (0x0140c150) = < NV12ToARGB_drvapi >
>> modGetCudaFunction< CUDA file: NV12ToARGB_drvapi64.ptx >
CUDA Kernel Function (0x0140f2e0) = < Passthru_drvapi >
Free memory: 2847.0508 MB
> VideoDecoder::cudaVideoCreateFlags = <4>Use CUVID decoder
[cudaDecodeGL] - [Field: 0016, 00.0 fps, frame time: 90032283648.00 (ms) ]
[cudaDecodeGL] - [Field: 0032, 136.7 fps, frame time: 7.32 (ms) ]
[cudaDecodeGL] - [Field: 0048, 136.7 fps, frame time: 7.32 (ms) ]
[cudaDecodeGL] - [Field: 0064, 136.6 fps, frame time: 7.32 (ms) ]
[cudaDecodeGL] - [Field: 0080, 137.9 fps, frame time: 7.25 (ms) ]
[cudaDecodeGL] - [Field: 0096, 135.3 fps, frame time: 7.39 (ms) ]
[cudaDecodeGL] - [Field: 0112, 133.9 fps, frame time: 7.47 (ms) ]
[cudaDecodeGL] - [Field: 0128, 132.3 fps, frame time: 7.56 (ms) ]
[cudaDecodeGL] - [Field: 0144, 135.2 fps, frame time: 7.40 (ms) ]
[cudaDecodeGL] - [Field: 0160, 133.2 fps, frame time: 7.51 (ms) ]
[cudaDecodeGL] - [Field: 0176, 136.4 fps, frame time: 7.33 (ms) ]
[cudaDecodeGL] - [Field: 0192, 141.8 fps, frame time: 7.05 (ms) ]
[cudaDecodeGL] - [Field: 0208, 137.9 fps, frame time: 7.25 (ms) ]
[cudaDecodeGL] - [Field: 0224, 139.1 fps, frame time: 7.19 (ms) ]
[cudaDecodeGL] - [Field: 0240, 134.1 fps, frame time: 7.46 (ms) ]
[cudaDecodeGL] - [Field: 0256, 135.9 fps, frame time: 7.36 (ms) ]
[cudaDecodeGL] - [Field: 0272, 138.1 fps, frame time: 7.24 (ms) ]
[cudaDecodeGL] - [Field: 0288, 137.1 fps, frame time: 7.29 (ms) ]
[cudaDecodeGL] - [Field: 0304, 136.8 fps, frame time: 7.31 (ms) ]
[cudaDecodeGL] - [Field: 0320, 138.7 fps, frame time: 7.21 (ms) ]
[cudaDecodeGL] - [Field: 0336, 159.0 fps, frame time: 6.29 (ms) ]
[cudaDecodeGL] - [Field: 0352, 134.0 fps, frame time: 7.46 (ms) ]
[cudaDecodeGL] - [Field: 0368, 133.2 fps, frame time: 7.51 (ms) ]
[cudaDecodeGL] - [Field: 0384, 134.1 fps, frame time: 7.46 (ms) ]
[cudaDecodeGL] - [Field: 0400, 133.3 fps, frame time: 7.50 (ms) ]
[cudaDecodeGL] - [Field: 0416, 134.2 fps, frame time: 7.45 (ms) ]
[cudaDecodeGL] - [Field: 0432, 134.3 fps, frame time: 7.45 (ms) ]
[cudaDecodeGL] - [Field: 0448, 135.7 fps, frame time: 7.37 (ms) ]
[cudaDecodeGL] - [Field: 0464, 135.9 fps, frame time: 7.36 (ms) ]
[cudaDecodeGL] - [Field: 0480, 138.3 fps, frame time: 7.23 (ms) ]
[cudaDecodeGL] - [Field: 0496, 142.6 fps, frame time: 7.01 (ms) ]
[cudaDecodeGL] - [Field: 0512, 139.3 fps, frame time: 7.18 (ms) ]
[cudaDecodeGL] - [Field: 0528, 138.4 fps, frame time: 7.23 (ms) ]
[cudaDecodeGL] - [Field: 0544, 138.3 fps, frame time: 7.23 (ms) ]
[cudaDecodeGL] - [Field: 0560, 130.7 fps, frame time: 7.65 (ms) ]
[cudaDecodeGL] - [Field: 0576, 146.8 fps, frame time: 6.81 (ms) ]
[cudaDecodeGL] - [Field: 0592, 144.1 fps, frame time: 6.94 (ms) ]
[cudaDecodeGL] - [Field: 0608, 139.4 fps, frame time: 7.17 (ms) ]
[cudaDecodeGL] - [Field: 0624, 155.0 fps, frame time: 6.45 (ms) ]
[cudaDecodeGL] - [Field: 0640, 146.5 fps, frame time: 6.83 (ms) ]
[cudaDecodeGL] - [Field: 0656, 146.6 fps, frame time: 6.82 (ms) ]
[cudaDecodeGL] - [Field: 0672, 143.4 fps, frame time: 6.97 (ms) ]
[cudaDecodeGL] - [Field: 0688, 142.1 fps, frame time: 7.04 (ms) ]
[cudaDecodeGL] - [Field: 0704, 141.4 fps, frame time: 7.07 (ms) ]
[cudaDecodeGL] - [Field: 0720, 144.2 fps, frame time: 6.93 (ms) ]
[cudaDecodeGL] - [Field: 0736, 146.0 fps, frame time: 6.85 (ms) ]
[cudaDecodeGL] - [Field: 0752, 144.7 fps, frame time: 6.91 (ms) ]
[cudaDecodeGL] - [Field: 0768, 146.9 fps, frame time: 6.81 (ms) ]
[cudaDecodeGL] - [Field: 0784, 145.8 fps, frame time: 6.86 (ms) ]
[cudaDecodeGL] - [Field: 0800, 140.7 fps, frame time: 7.11 (ms) ]
[cudaDecodeGL] - [Field: 0816, 141.1 fps, frame time: 7.09 (ms) ]
[cudaDecodeGL] - [Field: 0832, 125.0 fps, frame time: 8.00 (ms) ]
[cudaDecodeGL] - [Field: 0848, 131.0 fps, frame time: 7.63 (ms) ]
[cudaDecodeGL] statistics
Video Length (hh:mm:ss.msec) = 00:00:06.172
Frames Presented (inc repeats) = 856
Average Present Rate (fps) = 138.68
Frames Decoded (hardware) = 1712
Average Rate of Decoding (fps) = 277.36
GT 740
Using preferCuda
$ ./cudaDecodeGL -nointerop -decodecuda -device=1 /mnt/A/TS/video.mp4
[CUDA/OpenGL Video Decode]
Command Line Arguments:
argv[0] = ./cudaDecodeGL
argv[1] = -nointerop
argv[2] = -decodecuda
argv[3] = -device=1
argv[4] = /mnt/A/TS/video.mp4
[cudaDecodeGL]: input file: </mnt/A/TS/video.mp4>
VideoCodec : AVC/H.264
Frame rate : 30000/1001fps ~ 29.97fps
Sequence format : Interlaced
Coded frame size: [1920, 1088]
Display area : [0, 0, 1920, 1080]
Chroma format : 4:2:0
Bitrate : unknown
Aspect ratio : 16:9
argv[0] = ./cudaDecodeGL
argv[1] = -nointerop
argv[2] = -decodecuda
argv[3] = -device=1
argv[4] = /mnt/A/TS/video.mp4
gpuDeviceInitDRV() Using CUDA Device [1]: GeForce GT 740
gpuDeviceInitDRV() Using CUDA Device [1]: GeForce GT 740
> Using GPU Device: GeForce GT 740 has SM 3.0 compute capability
Total amount of global memory: 2047.8125 MB
>> modInitCTX<NV12ToARGB_drvapi64.ptx > initialized OK
>> modGetCudaFunction< CUDA file: NV12ToARGB_drvapi64.ptx >
CUDA Kernel Function (0x012027a0) = < NV12ToARGB_drvapi >
>> modGetCudaFunction< CUDA file: NV12ToARGB_drvapi64.ptx >
CUDA Kernel Function (0x01209930) = < Passthru_drvapi >
Free memory: 2024.1875 MB
> VideoDecoder::cudaVideoCreateFlags = <1>Use CUDA decoder
[cudaDecodeGL] - [Field: 0016, 00.0 fps, frame time: 90032283648.00 (ms) ]
[cudaDecodeGL] - [Field: 0032, 136.6 fps, frame time: 7.32 (ms) ]
[cudaDecodeGL] - [Field: 0048, 136.9 fps, frame time: 7.31 (ms) ]
[cudaDecodeGL] - [Field: 0064, 136.8 fps, frame time: 7.31 (ms) ]
[cudaDecodeGL] - [Field: 0080, 138.6 fps, frame time: 7.21 (ms) ]
[cudaDecodeGL] - [Field: 0096, 135.1 fps, frame time: 7.40 (ms) ]
[cudaDecodeGL] - [Field: 0112, 134.3 fps, frame time: 7.45 (ms) ]
[cudaDecodeGL] - [Field: 0128, 132.8 fps, frame time: 7.53 (ms) ]
[cudaDecodeGL] - [Field: 0144, 134.7 fps, frame time: 7.42 (ms) ]
[cudaDecodeGL] - [Field: 0160, 133.0 fps, frame time: 7.52 (ms) ]
[cudaDecodeGL] - [Field: 0176, 138.3 fps, frame time: 7.23 (ms) ]
[cudaDecodeGL] - [Field: 0192, 141.0 fps, frame time: 7.09 (ms) ]
[cudaDecodeGL] - [Field: 0208, 138.6 fps, frame time: 7.21 (ms) ]
[cudaDecodeGL] - [Field: 0224, 138.1 fps, frame time: 7.24 (ms) ]
[cudaDecodeGL] - [Field: 0240, 135.1 fps, frame time: 7.40 (ms) ]
[cudaDecodeGL] - [Field: 0256, 136.4 fps, frame time: 7.33 (ms) ]
[cudaDecodeGL] - [Field: 0272, 139.5 fps, frame time: 7.17 (ms) ]
[cudaDecodeGL] - [Field: 0288, 136.0 fps, frame time: 7.35 (ms) ]
[cudaDecodeGL] - [Field: 0304, 137.1 fps, frame time: 7.29 (ms) ]
[cudaDecodeGL] - [Field: 0320, 139.2 fps, frame time: 7.18 (ms) ]
[cudaDecodeGL] - [Field: 0336, 158.8 fps, frame time: 6.30 (ms) ]
[cudaDecodeGL] - [Field: 0352, 134.8 fps, frame time: 7.42 (ms) ]
[cudaDecodeGL] - [Field: 0368, 133.3 fps, frame time: 7.50 (ms) ]
[cudaDecodeGL] - [Field: 0384, 133.5 fps, frame time: 7.49 (ms) ]
[cudaDecodeGL] - [Field: 0400, 134.2 fps, frame time: 7.45 (ms) ]
[cudaDecodeGL] - [Field: 0416, 134.5 fps, frame time: 7.43 (ms) ]
[cudaDecodeGL] - [Field: 0432, 133.6 fps, frame time: 7.49 (ms) ]
[cudaDecodeGL] - [Field: 0448, 136.6 fps, frame time: 7.32 (ms) ]
[cudaDecodeGL] - [Field: 0464, 135.5 fps, frame time: 7.38 (ms) ]
[cudaDecodeGL] - [Field: 0480, 138.5 fps, frame time: 7.22 (ms) ]
[cudaDecodeGL] - [Field: 0496, 143.3 fps, frame time: 6.98 (ms) ]
[cudaDecodeGL] - [Field: 0512, 139.8 fps, frame time: 7.15 (ms) ]
[cudaDecodeGL] - [Field: 0528, 137.9 fps, frame time: 7.25 (ms) ]
[cudaDecodeGL] - [Field: 0544, 138.7 fps, frame time: 7.21 (ms) ]
[cudaDecodeGL] - [Field: 0560, 131.5 fps, frame time: 7.60 (ms) ]
[cudaDecodeGL] - [Field: 0576, 146.9 fps, frame time: 6.81 (ms) ]
[cudaDecodeGL] - [Field: 0592, 143.7 fps, frame time: 6.96 (ms) ]
[cudaDecodeGL] - [Field: 0608, 138.4 fps, frame time: 7.23 (ms) ]
[cudaDecodeGL] - [Field: 0624, 156.4 fps, frame time: 6.39 (ms) ]
[cudaDecodeGL] - [Field: 0640, 147.0 fps, frame time: 6.80 (ms) ]
[cudaDecodeGL] - [Field: 0656, 145.6 fps, frame time: 6.87 (ms) ]
[cudaDecodeGL] - [Field: 0672, 144.7 fps, frame time: 6.91 (ms) ]
[cudaDecodeGL] - [Field: 0688, 142.4 fps, frame time: 7.02 (ms) ]
[cudaDecodeGL] - [Field: 0704, 141.1 fps, frame time: 7.09 (ms) ]
[cudaDecodeGL] - [Field: 0720, 144.9 fps, frame time: 6.90 (ms) ]
[cudaDecodeGL] - [Field: 0736, 145.9 fps, frame time: 6.86 (ms) ]
[cudaDecodeGL] - [Field: 0752, 145.2 fps, frame time: 6.88 (ms) ]
[cudaDecodeGL] - [Field: 0768, 147.1 fps, frame time: 6.80 (ms) ]
[cudaDecodeGL] - [Field: 0784, 146.6 fps, frame time: 6.82 (ms) ]
[cudaDecodeGL] - [Field: 0800, 141.7 fps, frame time: 7.06 (ms) ]
[cudaDecodeGL] - [Field: 0816, 140.4 fps, frame time: 7.12 (ms) ]
[cudaDecodeGL] - [Field: 0832, 125.8 fps, frame time: 7.95 (ms) ]
[cudaDecodeGL] - [Field: 0848, 131.1 fps, frame time: 7.63 (ms) ]
[cudaDecodeGL] statistics
Video Length (hh:mm:ss.msec) = 00:00:06.161
Frames Presented (inc repeats) = 856
Average Present Rate (fps) = 138.92
Frames Decoded (hardware) = 1712
Average Rate of Decoding (fps) = 277.85
Using preferCuvid:
$ ./cudaDecodeGL -nointerop -decodecuvid -device=1 /mnt/A/TS/video.mp4
[CUDA/OpenGL Video Decode]
Command Line Arguments:
argv[0] = ./cudaDecodeGL
argv[1] = -nointerop
argv[2] = -decodecuvid
argv[3] = -device=1
argv[4] = /mnt/A/TS/video.mp4
[cudaDecodeGL]: input file: </mnt/A/TS/video.mp4>
VideoCodec : AVC/H.264
Frame rate : 30000/1001fps ~ 29.97fps
Sequence format : Interlaced
Coded frame size: [1920, 1088]
Display area : [0, 0, 1920, 1080]
Chroma format : 4:2:0
Bitrate : unknown
Aspect ratio : 16:9
argv[0] = ./cudaDecodeGL
argv[1] = -nointerop
argv[2] = -decodecuvid
argv[3] = -device=1
argv[4] = /mnt/A/TS/video.mp4
gpuDeviceInitDRV() Using CUDA Device [1]: GeForce GT 740
gpuDeviceInitDRV() Using CUDA Device [1]: GeForce GT 740
> Using GPU Device: GeForce GT 740 has SM 3.0 compute capability
Total amount of global memory: 2047.8125 MB
>> modInitCTX<NV12ToARGB_drvapi64.ptx > initialized OK
>> modGetCudaFunction< CUDA file: NV12ToARGB_drvapi64.ptx >
CUDA Kernel Function (0x018e8c40) = < NV12ToARGB_drvapi >
>> modGetCudaFunction< CUDA file: NV12ToARGB_drvapi64.ptx >
CUDA Kernel Function (0x018ebdd0) = < Passthru_drvapi >
Free memory: 2024.1875 MB
> VideoDecoder::cudaVideoCreateFlags = <4>Use CUVID decoder
[cudaDecodeGL] - [Field: 0016, 00.0 fps, frame time: 90032308224.00 (ms) ]
[cudaDecodeGL] - [Field: 0032, 136.9 fps, frame time: 7.30 (ms) ]
[cudaDecodeGL] - [Field: 0048, 136.3 fps, frame time: 7.34 (ms) ]
[cudaDecodeGL] - [Field: 0064, 137.7 fps, frame time: 7.26 (ms) ]
[cudaDecodeGL] - [Field: 0080, 137.8 fps, frame time: 7.26 (ms) ]
[cudaDecodeGL] - [Field: 0096, 136.3 fps, frame time: 7.34 (ms) ]
[cudaDecodeGL] - [Field: 0112, 133.1 fps, frame time: 7.52 (ms) ]
[cudaDecodeGL] - [Field: 0128, 132.7 fps, frame time: 7.53 (ms) ]
[cudaDecodeGL] - [Field: 0144, 136.1 fps, frame time: 7.35 (ms) ]
[cudaDecodeGL] - [Field: 0160, 133.2 fps, frame time: 7.51 (ms) ]
[cudaDecodeGL] - [Field: 0176, 137.5 fps, frame time: 7.27 (ms) ]
[cudaDecodeGL] - [Field: 0192, 141.0 fps, frame time: 7.09 (ms) ]
[cudaDecodeGL] - [Field: 0208, 138.7 fps, frame time: 7.21 (ms) ]
[cudaDecodeGL] - [Field: 0224, 138.1 fps, frame time: 7.24 (ms) ]
[cudaDecodeGL] - [Field: 0240, 135.1 fps, frame time: 7.40 (ms) ]
[cudaDecodeGL] - [Field: 0256, 135.3 fps, frame time: 7.39 (ms) ]
[cudaDecodeGL] - [Field: 0272, 139.5 fps, frame time: 7.17 (ms) ]
[cudaDecodeGL] - [Field: 0288, 137.2 fps, frame time: 7.29 (ms) ]
[cudaDecodeGL] - [Field: 0304, 135.7 fps, frame time: 7.37 (ms) ]
[cudaDecodeGL] - [Field: 0320, 140.0 fps, frame time: 7.14 (ms) ]
[cudaDecodeGL] - [Field: 0336, 160.1 fps, frame time: 6.24 (ms) ]
[cudaDecodeGL] - [Field: 0352, 133.9 fps, frame time: 7.47 (ms) ]
[cudaDecodeGL] - [Field: 0368, 133.6 fps, frame time: 7.49 (ms) ]
[cudaDecodeGL] - [Field: 0384, 133.8 fps, frame time: 7.47 (ms) ]
[cudaDecodeGL] - [Field: 0400, 135.0 fps, frame time: 7.41 (ms) ]
[cudaDecodeGL] - [Field: 0416, 133.9 fps, frame time: 7.47 (ms) ]
[cudaDecodeGL] - [Field: 0432, 133.0 fps, frame time: 7.52 (ms) ]
[cudaDecodeGL] - [Field: 0448, 136.8 fps, frame time: 7.31 (ms) ]
[cudaDecodeGL] - [Field: 0464, 135.4 fps, frame time: 7.39 (ms) ]
[cudaDecodeGL] - [Field: 0480, 139.4 fps, frame time: 7.18 (ms) ]
[cudaDecodeGL] - [Field: 0496, 142.7 fps, frame time: 7.01 (ms) ]
[cudaDecodeGL] - [Field: 0512, 138.9 fps, frame time: 7.20 (ms) ]
[cudaDecodeGL] - [Field: 0528, 138.9 fps, frame time: 7.20 (ms) ]
[cudaDecodeGL] - [Field: 0544, 139.1 fps, frame time: 7.19 (ms) ]
[cudaDecodeGL] - [Field: 0560, 131.7 fps, frame time: 7.59 (ms) ]
[cudaDecodeGL] - [Field: 0576, 146.4 fps, frame time: 6.83 (ms) ]
[cudaDecodeGL] - [Field: 0592, 145.6 fps, frame time: 6.87 (ms) ]
[cudaDecodeGL] - [Field: 0608, 137.6 fps, frame time: 7.27 (ms) ]
[cudaDecodeGL] - [Field: 0624, 156.5 fps, frame time: 6.39 (ms) ]
[cudaDecodeGL] - [Field: 0640, 146.5 fps, frame time: 6.83 (ms) ]
[cudaDecodeGL] - [Field: 0656, 146.3 fps, frame time: 6.84 (ms) ]
[cudaDecodeGL] - [Field: 0672, 144.0 fps, frame time: 6.95 (ms) ]
[cudaDecodeGL] - [Field: 0688, 142.4 fps, frame time: 7.02 (ms) ]
[cudaDecodeGL] - [Field: 0704, 142.0 fps, frame time: 7.04 (ms) ]
[cudaDecodeGL] - [Field: 0720, 143.3 fps, frame time: 6.98 (ms) ]
[cudaDecodeGL] - [Field: 0736, 146.4 fps, frame time: 6.83 (ms) ]
[cudaDecodeGL] - [Field: 0752, 145.2 fps, frame time: 6.89 (ms) ]
[cudaDecodeGL] - [Field: 0768, 147.4 fps, frame time: 6.78 (ms) ]
[cudaDecodeGL] - [Field: 0784, 146.2 fps, frame time: 6.84 (ms) ]
[cudaDecodeGL] - [Field: 0800, 141.5 fps, frame time: 7.07 (ms) ]
[cudaDecodeGL] - [Field: 0816, 141.4 fps, frame time: 7.07 (ms) ]
[cudaDecodeGL] - [Field: 0832, 125.3 fps, frame time: 7.98 (ms) ]
[cudaDecodeGL] - [Field: 0848, 130.4 fps, frame time: 7.67 (ms) ]
[cudaDecodeGL] statistics
Video Length (hh:mm:ss.msec) = 00:00:06.160
Frames Presented (inc repeats) = 856
Average Present Rate (fps) = 138.94
Frames Decoded (hardware) = 1712
Average Rate of Decoding (fps) = 277.88
I’m getting the same performance (decoding measured in fps) in both cards using both cuda and cuvid.
Although the GTX780TI has 2880 cuda cores a huge number compared to the 384 cores of the GT 740 i achieve the same decoding fps.
I’ve bought the GTX 780TI expecting an improvement in decoding performance, the problem is that i didn’t get it. Am i forgetting something to configure?.
Will i get more decoding fps using a Quadro or Tesla card?. Which should i buy?. Are there any specification about this?.
If it’s needed, this is the ouput of the deciveQuery:
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 2 CUDA Capable device(s)
Device 0: "GeForce GTX 780 Ti"
CUDA Driver Version / Runtime Version 7.5 / 7.0
CUDA Capability Major/Minor version number: 3.5
Total amount of global memory: 3071 MBytes (3220504576 bytes)
(15) Multiprocessors, (192) CUDA Cores/MP: 2880 CUDA Cores
GPU Max Clock rate: 1046 MHz (1.05 GHz)
Memory Clock rate: 3500 Mhz
Memory Bus Width: 384-bit
L2 Cache Size: 1572864 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
Device 1: "GeForce GT 740"
CUDA Driver Version / Runtime Version 7.5 / 7.0
CUDA Capability Major/Minor version number: 3.0
Total amount of global memory: 2048 MBytes (2147287040 bytes)
( 2) Multiprocessors, (192) CUDA Cores/MP: 384 CUDA Cores
GPU Max Clock rate: 1072 MHz (1.07 GHz)
Memory Clock rate: 2500 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 262144 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 3 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
> Peer access from GeForce GTX 780 Ti (GPU0) -> GeForce GT 740 (GPU1) : No
> Peer access from GeForce GT 740 (GPU1) -> GeForce GTX 780 Ti (GPU0) : No
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.5, CUDA Runtime Version = 7.0, NumDevs = 2, Device0 = GeForce GTX 780 Ti, Device1 = GeForce GT 740
Result = PASS