Hello,
We noticed strange behavior while performing tests in multi GPU environment:
• Hardware Platform: NVIDIA Lab server with 6x Tesla T4
• DeepStream Version 5.0.0
• NVIDIA GPU Driver Version 440.100
From the same app we are starting threads to read multiple rtsp sources. We are using gstreamer pipelines with nvv4l2decoder
. We try to spread those reads across devices using cudaSetDevice(<target gpu>)
in the very beginning of each thread and nvv4l2decoder gpu-id=<target gpu>
option in pipelines. We observe that the gpu0 seems to always have unexpected load. For example, when running 5 threads per each gpu, nvidia-smi
shows the following:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.100 Driver Version: 440.100 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 00000000:3B:00.0 Off | 0 |
| N/A 48C P0 27W / 70W | 519MiB / 15109MiB | 22% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla T4 On | 00000000:5E:00.0 Off | 0 |
| N/A 46C P0 28W / 70W | 279MiB / 15109MiB | 8% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla T4 On | 00000000:60:00.0 Off | 0 |
| N/A 46C P0 26W / 70W | 279MiB / 15109MiB | 8% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla T4 On | 00000000:86:00.0 Off | 0 |
| N/A 47C P0 27W / 70W | 279MiB / 15109MiB | 8% Default |
+-------------------------------+----------------------+----------------------+
| 4 Tesla T4 On | 00000000:AF:00.0 Off | 0 |
| N/A 46C P0 28W / 70W | 279MiB / 15109MiB | 5% Default |
+-------------------------------+----------------------+----------------------+
| 5 Tesla T4 On | 00000000:D8:00.0 Off | 0 |
| N/A 44C P0 26W / 70W | 279MiB / 15109MiB | 5% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 37241 C ./test_reader_multi_gpu 507MiB |
| 1 37241 C ./test_reader_multi_gpu 267MiB |
| 2 37241 C ./test_reader_multi_gpu 267MiB |
| 3 37241 C ./test_reader_multi_gpu 267MiB |
| 4 37241 C ./test_reader_multi_gpu 267MiB |
| 5 37241 C ./test_reader_multi_gpu 267MiB |
+-----------------------------------------------------------------------------+
Notice gpu0 utilization and memory usage.
Going further we run 30 threads with decoders only on gpu3, some image analysis on gpu1 and gpu2, leaving gpu0 without any work. The output of nvidia-smi dmon -i 0,1,2,3 -s u
is shown in the image:
Notice that gpu0 utilization is n’t 0.
Please, help to answer the following questions:
Are we doing anything wrong?
What is always running on gpu0?