We tried to reproduce DeepStream SKD 5.0 performance results from the samples by running up to 30 1080p streams using the config file
source30_1080p_dec_infer-resnet_tiled_display_int8.txt across multiple GPUs as mentioned below:
• Hardware Platform (dGPU): Tesla V100 and Tesla T4
• DeepStream Version 5.0
• TensorRT Version 7.0
• NVIDIA GPU Driver Version 450.51
Using the command:
$ deepstream-app -c /opt/nvidia/deepstream/deepstream-5.0/samples/configs/deepstream-app/source30_1080p_dec_infer-resnet_tiled_display_int8.txt
Tesla T4 Results:
- Using a single instance of the command mentioned above, The 30 streams were running @~30 FPS while GPU Utilization was ~35%.
- Using two instances of the command, The 60 streams were running @~18.5 FPs while GPU Utilization was ~35%. Why was the GPU capped although the card did not max out?
Tesla V100 Results:
- Using a single instance of the command mentioned above, The 30 streams were running @~20 FPS while GPU Utilization was ~25%.
- Using two instances of the command, The 60 streams were running @~10 FPs while GPU Utilization was ~25%.
I have the following questions:
- Why was the Tesla V100 beaten by the the Tesla T4 although the Tesla V100 has double the tensor cores of the Tesla T4?
- Why did the Tesla T4 cap the performance while at ~35% Utilization only? where is the bottleneck? Does it have anything to do with the model being run in int8 mode or compute capability?