When change preprocess gpu-id,it cannot detect any thing,but the program can run normally

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) GPU
• DeepStream Version 6.1
• JetPack Version (valid for Jetson only)
• TensorRT Version 8.2.5-1+cuda11.4
• NVIDIA GPU Driver Version (valid for GPU only) 510.68.02
• Issue Type( questions, new requirements, bugs) questions
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)

  1. When i run the deepstream-app demo, if i use preprocess plugin and set the gpu-id=0 ,it can detect the things that model can detect. But when i change the gpu-id to 1 or 2, it cannot detect any thing. It still can watch the out stream by vlc and there is also roi rect on the stream with no detect thing.
  2. Run the deepstream-app demo without preprocess plugin and set others plugin`s gpu-id to 1 or 2, it can detect things normally. So i think may the preprocess plugin or my config file has some problem.

• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

I add a config gpu-id to the preprocess bin in the deepstream-app demo,so it can change the gpu-id through the config file.

here is the app config file :

[application]
enable-perf-measurement=1
perf-measurement-interval-sec=5
[source0]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI 4=RTSP 5=CSI
type=3
uri=file:/work/deepstream-6.1-origin/samples/streams/sample_1080p_h264.mp4
#uri=rtsp://admin:zhy12345@172.16.72.64:554/Streaming/Channels/101
#uri=rtmp://192.168.0.14:1935/live/live999
num-sources=1
gpu-id=0
cudadec-memtype=0
[sink0]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File 4=RTSPStreaming 5=Overlay 8=rtmpStreaming
type=8
source-id=0
#Indicates how fast the stream is to be rendered. 0: As fast as possible 1: Synchronously
sync=0
gpu-id=0
nvbuf-memory-type=0
codec=1
enc-type=0
qos=0
bitrate=4000000
iframeinterval=30
#rtsp-port=8857
#udp-port=5400
rtmp-address=192.168.0.87
rtmp-port=26666
[sink4]
enable=1
type=6
topic=yolo-meta-app-custom
msg-conv-config=dstest5_msgconv_sample_config.txt
msg-conv-msg2p-new-api=1
msg-conv-payload-type=257
msg-conv-msg2p-lib=/work/deepstream/sources/libs/zhy_nvmsgconv/libzhy_nvds_msgconv.so
msg-broker-proto-lib=/opt/nvidia/deepstream/deepstream-6.1/lib/libnvds_redis_proto.so
msg-broker-conn-str=192.168.0.87;36379
[osd]
enable=1
gpu-id=0
border-width=5
text-size=15
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Serif
show-clock=0
clock-x-offset=800
clock-y-offset=820
clock-text-size=12
clock-color=1;0;0;0
nvbuf-memory-type=0
label-file=./deepStream-Yolo-zhy/labels.txt
count-ids=0;2
opencv-img-save=0
save-interval=10
save-img-path=/work/

[streammux]
gpu-id=0
#Boolean property to inform muxer that sources are live
live-source=1
batch-size=1
#time out in usec, to wait after the first buffer is available
#to push the batch even if the complete batch is not formed
batched-push-timeout=40000
#Set muxer output width and height
width=1920
height=1080
enable-padding=0
nvbuf-memory-type=0

[primary-gie]
enable=1
gpu-id=0
batch-size=4
gie-unique-id=1
nvbuf-memory-type=3
input-tensor-meta=1
config-file=config_infer_primary_yoloV5.txt

[pre-process]
enable=1
gpu-id=0
config-file=config_preprocess.txt

[tests]
file-loop=1

The preprocess config file is:

[property]
enable=1
target-unique-ids=1
#0=NCHW, 1=NHWC, 2=CUSTOM
network-input-order=0
#if enabled maintain the aspect ratio while scaling
maintain-aspect-ratio=1
#if enabled pad symmetrically with maintain-aspect-ratio enabled
symmetric-padding=1
#processig width/height at which image scaled
processing-width=960
processing-height=960
scaling-buf-pool-size=6
tensor-buf-pool-size=6
#tensor shape based on network-input-order
network-input-shape= 4;3;960;960
network-color-format=0
tensor-data-type=0
tensor-name=data
scaling-pool-memory-type=0
scaling-pool-compute-hw=0
scaling-filter=0
custom-lib-path=/work/deepstream/sources/gst-plugins/gst-nvdspreprocess/nvdspreprocess_lib/libcustom2d_preprocess.so
custom-tensor-preparation-function=CustomTensorPreparation
[user-configs]
pixel-normalization-factor=0.003921568
#mean-file=
#offsets=
[group-0]
src-ids=0
custom-input-transformation-function=CustomAsyncTransformation
process-on-roi=1
roi-params-src-0=200;200;1000;600

The pgie config file is:

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
model-color-format=0
custom-network-config=/work/deepstream/sources/apps/sample_apps/deepstream-app/deepStream-Yolo-zhy/best.cfg
#model-file=best.wts
model-engine-file=…/model_b4_gpu0_fp16.engine
#model-engine-file=wurenji_1121.engine
#int8-calib-file=int8calib.table
labelfile-path=labels.txt
batch-size=1
network-mode=2
num-detected-classes=1
interval=0
gie-unique-id=1
process-mode=1
network-type=0
cluster-mode=2
maintain-aspect-ratio=1
input-tensor-from-meta=1
parse-bbox-func-name=NvDsInferParseYolo
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name=NvDsInferYoloCudaEngineGet

[class-attrs-all]
nms-iou-threshold=0.45
pre-cluster-threshold=0.25

how many GPU do you have? if only one, gpu-id should be set to 0.

I have 8 gpu in the server. The company let me use gpu7, but when i set gpu-id to 7, it comes to this question.

How can i solve this problem ? I print the debug log. But donnot have thoughts to locate the problem.

I also print the pipeline playing img of gpu-id=0 and gpu-id =7. Before the gie plugin ,it has differences between them in the queue.

gpu-0 pipeline:


gpu-7 pipeline:

It seems the gpu-7 queue don`t receive data.

  1. you can add a probe on gpu-7 queue to check if received data.
  2. you can add log in attach_metadata_detector to check if detection_output.numObjects is 0.
  1. I add a sink pad probe on the gpu-7 queue. It has one frame in a batch_meta. But i set 4 roi block,after the preprocess ,shouldnot it be 4 frames in one batch_meta?
  2. I add print at the bbox_cb , it has no objs.

yes, should be 4.

need to narrow down this issue, please refer to dump_infer_input_to_file.patch.txt in DeepStream SDK FAQ - #9 by mchi to dump the input data before inference, by this method, we can know if preprocess plugin can give right tensor data.

sorry for long time to make a reply .
I follow the instructions to dump the img .
I make a test with gpu-id=0,6,7
The gpu-id=0, it dump the correct img.


gpu-id=6 and gpu-id=7 dump a img with nothing ,

And there size are different from gpu-id=0 dump img:

I check there attr in the windows,they are all 640*640 with bitedepth=24,
image
image

I also dump the raw tensor ,
here is the file, but i open it with notepad, it shows messy bytes.
gpu-id 0 raw data file:
preprocess-gie-1_input-0_batch-0_frame-11-gpu0.raw (4.7 MB)
gpu-id 6 raw data file:
preprocess-gie-1_input-0_batch-0_frame-11-gpu6.raw (4.7 MB)

thanks for your sharing , from the test results, it should be preprocess plugin issue, not the inference issue, I will try to reproduce.
you can use deepstream docker to test gpu6 or gpu7 by specifying gpu device like this: docker run --gpus “device=6” , by this method, there will be only one gpu in docker, you can check if this gpu can run application normally. here are docker docs: DeepStream | NVIDIA NGC
Quickstart Guide — DeepStream 6.1.1 Release documentation

I found a mean that can run correctly by export CUDA_VISIBLE_DIVECES variable.

CUDA_VISIBLE_DEVICES=“6” ./deepstream-app -c deepStream-Yolo-zhy/deepstream_app_config_rtsp.txt

In the deepstream_app_config_rtsp.txt , the gpu-id sets to 0. The program will use gpu-id 6 to process the data.
This command can get the correct results.