Strange gpu0 load in multi GPU environment using nvv4l2decoder

Hello,

We noticed strange behavior while performing tests in multi GPU environment:
• Hardware Platform: NVIDIA Lab server with 6x Tesla T4
• DeepStream Version 5.0.0
• NVIDIA GPU Driver Version 440.100

From the same app we are starting threads to read multiple rtsp sources. We are using gstreamer pipelines with nvv4l2decoder. We try to spread those reads across devices using cudaSetDevice(<target gpu>) in the very beginning of each thread and nvv4l2decoder gpu-id=<target gpu> option in pipelines. We observe that the gpu0 seems to always have unexpected load. For example, when running 5 threads per each gpu, nvidia-smi shows the following:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.100      Driver Version: 440.100      CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:3B:00.0 Off |                    0 |
| N/A   48C    P0    27W /  70W |    519MiB / 15109MiB |     22%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla T4            On   | 00000000:5E:00.0 Off |                    0 |
| N/A   46C    P0    28W /  70W |    279MiB / 15109MiB |      8%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla T4            On   | 00000000:60:00.0 Off |                    0 |
| N/A   46C    P0    26W /  70W |    279MiB / 15109MiB |      8%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla T4            On   | 00000000:86:00.0 Off |                    0 |
| N/A   47C    P0    27W /  70W |    279MiB / 15109MiB |      8%      Default |
+-------------------------------+----------------------+----------------------+
|   4  Tesla T4            On   | 00000000:AF:00.0 Off |                    0 |
| N/A   46C    P0    28W /  70W |    279MiB / 15109MiB |      5%      Default |
+-------------------------------+----------------------+----------------------+
|   5  Tesla T4            On   | 00000000:D8:00.0 Off |                    0 |
| N/A   44C    P0    26W /  70W |    279MiB / 15109MiB |      5%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     37241      C   ./test_reader_multi_gpu                      507MiB |
|    1     37241      C   ./test_reader_multi_gpu                      267MiB |
|    2     37241      C   ./test_reader_multi_gpu                      267MiB |
|    3     37241      C   ./test_reader_multi_gpu                      267MiB |
|    4     37241      C   ./test_reader_multi_gpu                      267MiB |
|    5     37241      C   ./test_reader_multi_gpu                      267MiB |
+-----------------------------------------------------------------------------+

Notice gpu0 utilization and memory usage.

Going further we run 30 threads with decoders only on gpu3, some image analysis on gpu1 and gpu2, leaving gpu0 without any work. The output of nvidia-smi dmon -i 0,1,2,3 -s u is shown in the image:


Notice that gpu0 utilization is n’t 0.

Please, help to answer the following questions:
Are we doing anything wrong?
What is always running on gpu0?

From the log of nvidia-smi you attached, only GPU 3 is runing nvv4l2decoder(please check ‘dec’), can you check your code to check what you have set with 'nvv4l2decoder gpu-id= '?

Any update? Is this still an issue to support?

Exactly, all decoding jobs are assigned to gpu3, some analysis runs on gpu1 and gpu2, gpu0 is left without any work, but still gpu0 is utilized according to nvidia-smi.

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

Other functions such as video converter or display may also use GPU, have you designated specific GPU for these processing too? Can you show us your test application or the whole pipeline?