Deepstream Transfer Learning App - Black images when running on second GPU

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)

GPU - RTX A5000 x 2

• DeepStream Version

6.1 via docker image nvcr.io/nvidia/deepstream:6.1-devel

• JetPack Version (valid for Jetson only)

N/A

• TensorRT Version

• NVIDIA GPU Driver Version (valid for GPU only)

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.73.05    Driver Version: 510.73.05    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA RTX A5000    Off  | 00000000:01:00.0 Off |                  Off |
| 47%   75C    P2   196W / 230W |   4031MiB / 24564MiB |     72%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA RTX A5000    Off  | 00000000:02:00.0 Off |                  Off |
| 30%   31C    P8    12W / 230W |      8MiB / 24564MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

• Issue Type( questions, new requirements, bugs)

Bug?

• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)

I want to run the Transfer Learning App on two GPUs at the same time. I have two RTX A5000 cards. I can successfully run it on GPU 0 but when I try to run it on GPU 1 it runs, however it saves black images.

Steps to reproduce:

Good run on GPU 0:


# Run the docker:
docker run \
--gpus all \
-it \
--rm \
--net=host \
--privileged \
-v /tmp/.X11-unix:/tmp/.X11-unix \
-v $(pwd):/code/ \
-e DISPLAY=$DISPLAY \
-w /opt/nvidia/deepstream/deepstream-6.1/sources/apps/sample_apps/deepstream-transfer-learning-app/configs \
nvcr.io/nvidia/deepstream:6.1-devel

# Create the output directory
mkdir output

# Run the example
deepstream-transfer-learning-app -c ds_transfer_learning_app_example.txt

# View an output image - SUCCESS!
apt install -y feh
feh output/images/camera-0_2022-06-06T09\:47\:55+0000_0000000044.jpg 

Now try to run it on GPU1 instead:


# Remove traces of last run
rm -r output && mkdir output

# Change the app config file for GPU1
sed -i 's/gpu-id=0/gpu-id=1/g' ds_transfer_learning_app_example.txt
sed -i 's/gpu0/gpu1/g' ds_transfer_learning_app_example.txt


# Change the PGIE to GPU1
sed -i 's/gpu-id=0/gpu-id=1/g' config_infer_primary_ds_transfer_learning.txt
sed -i 's/gpu0/gpu1/g' config_infer_primary_ds_transfer_learning.txt

# Run again
deepstream-transfer-learning-app -c ds_transfer_learning_app_example.txt

# View an output image - FAIL - it's black!
feh camera-0_2022-06-06T10:03:46+0000_0000000042.jpg

Note that I also have set the output sink to “FakeSink” (type=1):

[sink0]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File
type=1
sync=1
source-id=0
gpu-id=1
nvbuf-memory-type=0

Am I missing some configuration? Or is this a bug?

OK, after a bit of digging I found this: Deeepstream-test5 error when save image - #33 by hung

If I run my app with the CUDA_VISIBLE_DEVICES=1, e.g:

CUDA_VISIBLE_DEVICES=1 deepstream-transfer-learning-app -c ds_transfer_learning_app_example.txt

Then it works. Note that you must still use gpu-id=0 in your configuration files.

Note that there is a typo in the linked thread, e.g.

CUDA_VISIBLE_DEVICES=1  # Correct
CUDA_VISIABLE_DEVICES=1 # Wrong! Typo

It would still be good to have the bug confirmed and fixed however.

The issue reproduced, we are investigating the issue. thanks for reporting the issue.

1 Like

The fix will be in upcoming release.

1 Like