Please provide complete information as applicable to your setup.
• Hardware Platform (Jetson / GPU)
GPU (A5000)
• DeepStream Version
Deepstream 6.2
• JetPack Version (valid for Jetson only)
• TensorRT Version
TensorRT 8.5.2
• NVIDIA GPU Driver Version (valid for GPU only)
Version 525.85
• Issue Type( questions, new requirements, bugs)
Potential Bug
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
The issue can be recreated using the sample provided at GitHub - NVIDIA-AI-IOT/deepstream_parallel_inference_app: A project demonstrating how to use nvmetamux to run multiple models in parallel. All the models and videos come from the samples, so it can be recreated without additional data.
[Config File]
# SPDX-FileCopyrightText: Copyright (c) <2022> NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: MIT
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.
# The values in the config file are overridden by values set through GObject
# properties.
application:
enable-perf-measurement: 1
perf-measurement-interval-sec: 5
tiled-display:
enable: 0
rows: 2
columns: 2
width: 1280
height: 720
gpu-id: 0
#(0): nvbuf-mem-default - Default memory allocated, specific to particular platform
#(1): nvbuf-mem-cuda-pinned - Allocate Pinned/Host cuda memory, applicable for Tesla
#(2): nvbuf-mem-cuda-device - Allocate Device cuda memory, applicable for Tesla
#(3): nvbuf-mem-cuda-unified - Allocate Unified cuda memory, applicable for Tesla
#(4): nvbuf-mem-surface-array - Allocate Surface Array memory, applicable for Jetson
nvbuf-memory-type: 0
source:
csv-file-path: sources_2_different_sources.csv
#csv-file-path: sources_4_different_source.csv
#csv-file-path: sources_4_rtsp.csv
sink0:
enable: 1
#Type - 1=FakeSink 2=EglSink 3=File 7=nv3dsink (Jetson only)
type: 1
sync: 1
source-id: 0
gpu-id: 0
nvbuf-memory-type: 3
osd:
enable: 1
gpu-id: 0
border-width: 1
text-size: 15
#value changed
text-color: 1;1;1;1
text-bg-color: 0.3;0.3;0.3;1
font: Serif
show-clock: 0
clock-x-offset: 800
clock-y-offset: 820
clock-text-size: 12
clock-color: 1;0;0;0
nvbuf-memory-type: 3
streammux:
gpu-id: 0
##Boolean property to inform muxer that sources are live
live-source: 0
buffer-pool-size: 4
batch-size: 2
##time out in usec, to wait after the first buffer is available
##to push the batch even if the complete batch is not formed
batched-push-timeout: 40000
## Set muxer output width and height
width: 1920
height: 1080
##Enable to maintain aspect ratio wrt source, and allow black borders, works
##along with width, height properties
enable-padding: 0
nvbuf-memory-type: 3
primary-gie0:
enable: 1
#(0): nvinfer; (1): nvinferserver
plugin-type: 0
gpu-id: 0
#input-tensor-meta: 1
batch-size: 4
#Required by the app for OSD, not a plugin property
bbox-border-color0: 1;0;0;1
bbox-border-color1: 0;1;1;1
bbox-border-color2: 0;0;1;1
bbox-border-color3: 0;1;0;1
#interval: 0
gie-unique-id: 1
nvbuf-memory-type: 3
#config-file: ../../yolov4/config_yolov4_inferserver.txt
config-file: ../yolov4/config_yolov4_infer.txt
branch0:
## pgie's id
pgie-id: 1
## select sources by sourceid
src-ids: 0;1
secondary-gie0:
enable: 1
##support mulptiple sgie.
cfg-file-path: secondary-gie0.yml
primary-gie1:
enable: 1
#(0): nvinfer; (1): nvinferserver
plugin-type: 0
gpu-id: 0
#input-tensor-meta: 1
batch-size: 4
#Required by the app for OSD, not a plugin property
bbox-border-color0: 1;0;0;1
bbox-border-color1: 0;1;1;1
bbox-border-color2: 0;0;1;1
bbox-border-color3: 0;1;0;1
#interval: 0
gie-unique-id: 2
nvbuf-memory-type: 3
#config-file: ../../yolov4/config_yolov4_inferserver.txt
config-file: ../yolov4/config_yolov4_infer.txt
branch1:
## pgie's id
pgie-id: 2
## select sources by sourceid
src-ids: 0;1
secondary-gie1:
enable: 1
##supoort multiple sgie
cfg-file-path: ./secondary-gie1.yml
meta-mux:
enable: 1
#config-file: ../../metamux/config_metamux0.txt
config-file: ./config_metamux0.txt
tests:
file-loop: 0
[sources_2_different_sources.csv]
Both of the sample videos are available with deepstream
enable,type,uri,num-sources,gpu-id,cudadec-memtype
1,3,file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h264.mp4,1,0,2
1,3,file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_qHD.mp4,1,0,0
[secondary-gie0.yml]
For demonstration purposes, config_infer_secondary_vehicletypes.txt model is applied on class-id 0 : Person.
secondary-gie0:
enable: 1
##(0): nvinfer; (1): nvinferserver
plugin-type: 0
## nvinferserserver's gpu-id can only set from its own config-file
#gpu-id=0
batch-size: 16
gie-unique-id: 11
operate-on-gie-id: 1
operate-on-class-ids: 2
config-file: config_infer_secondary_carcolor.txt
secondary-gie1:
enable: 1
##(0): nvinfer; (1): nvinferserver
plugin-type: 0
## nvinferserserver's gpu-id can only set from its own config-file
#gpu-id=0
batch-size: 16
gie-unique-id: 12
operate-on-gie-id: 1
operate-on-class-ids: 0
config-file: config_infer_secondary_vehicletypes.txt
Updated body_pose_gie_src_pad_buffer_probe
function to record PGIE and SGIE inference results as follows:
static GstPadProbeReturn
body_pose_gie_src_pad_buffer_probe(GstPad *pad, GstPadProbeInfo *info,
gpointer u_data)
{
gchar *msg = NULL;
GstBuffer *buf = (GstBuffer *)info->data;
NvDsMetaList *l_frame = NULL;
NvDsMetaList *l_obj = NULL;
NvDsMetaList *l_user = NULL;
NvDsMetaList *l_cls = NULL;
NvDsBatchMeta *batch_meta = gst_buffer_get_nvds_batch_meta(buf);
for (l_frame = batch_meta->frame_meta_list; l_frame != NULL;
l_frame = l_frame->next)
{
NvDsFrameMeta *frame_meta = (NvDsFrameMeta *)(l_frame->data);
if (frame_meta->batch_id == 0)
g_print("Processing frame number = %d\t\n", frame_meta->frame_num);
// for (l_user = frame_meta->frame_user_meta_list; l_user != NULL;
// l_user = l_user->next)
// {
// NvDsUserMeta *user_meta = (NvDsUserMeta *)l_user->data;
// if (user_meta->base_meta.meta_type == NVDSINFER_TENSOR_OUTPUT_META)
// {
// NvDsInferTensorMeta *tensor_meta =
// (NvDsInferTensorMeta *)user_meta->user_meta_data;
// Vec2D<int> objects;
// Vec3D<float> normalized_peaks;
// tie(objects, normalized_peaks) = parse_objects_from_tensor_meta(tensor_meta);
// create_display_meta(objects, normalized_peaks, frame_meta, frame_meta->source_frame_width, frame_meta->source_frame_height);
// }
// }
for (l_obj = frame_meta->obj_meta_list; l_obj != NULL;
l_obj = l_obj->next)
{
NvDsObjectMeta *obj_meta = (NvDsObjectMeta *)l_obj->data;
// for (l_user = obj_meta->obj_user_meta_list; l_user != NULL;
// l_user = l_user->next)
// {
// NvDsUserMeta *user_meta = (NvDsUserMeta *)l_user->data;
// if (user_meta->base_meta.meta_type == NVDSINFER_TENSOR_OUTPUT_META)
// {
// NvDsInferTensorMeta *tensor_meta =
// (NvDsInferTensorMeta *)user_meta->user_meta_data;
// Vec2D<int> objects;
// Vec3D<float> normalized_peaks;
// tie(objects, normalized_peaks) = parse_objects_from_tensor_meta(tensor_meta);
// create_display_meta(objects, normalized_peaks, frame_meta, frame_meta->source_frame_width, frame_meta->source_frame_height);
// }
// }
// Recording Inference Results
float left = obj_meta->detector_bbox_info.org_bbox_coords.left;
float top = obj_meta->detector_bbox_info.org_bbox_coords.top;
float right = left + obj_meta->detector_bbox_info.org_bbox_coords.width;
float bottom = top + obj_meta->detector_bbox_info.org_bbox_coords.height;
float confidence = obj_meta->confidence;
char outname[256];
sprintf(outname, "./outputs/tmp_src_%d.txt", frame_meta->source_id);
FILE* fp = fopen(outname, "a");
if(fp)
{
fprintf (fp, "[%d] cls_id %d comp_id %d :: %s :: %f %f %f %f :: conf %f\n",
frame_meta->frame_num, obj_meta->class_id, obj_meta->unique_component_id,
obj_meta->obj_label, left, top, right, bottom, confidence);
fclose(fp);
}
#if 1
for (l_cls = obj_meta->classifier_meta_list; l_cls != NULL; l_cls = l_cls->next)
{
NvDsClassifierMeta *cls_meta = (NvDsClassifierMeta *)l_cls->data;
NvDsLabelInfoList* l_label;
for (l_label = cls_meta->label_info_list; l_label != NULL; l_label = l_label->next)
{
NvDsLabelInfo *label_meta = (NvDsLabelInfo*) l_label->data;
fp = fopen(outname, "a");
if(fp)
{
fprintf(fp, "%d %s %d %f\n",
cls_meta->unique_component_id,
label_meta->result_label, label_meta->result_class_id, label_meta->result_prob);
fclose(fp);
}
}
}
#endif
#if 1
for (l_user = obj_meta->obj_user_meta_list; l_user != NULL;
l_user = l_user->next)
{
NvDsUserMeta *user_meta = (NvDsUserMeta *)l_user->data;
if (user_meta->base_meta.meta_type == NVDSINFER_TENSOR_OUTPUT_META)
{
NvDsInferTensorMeta *tensor_meta =
(NvDsInferTensorMeta *)user_meta->user_meta_data;
/** Holds the TensorRT binding index of the layer. */
int bindingIndex = tensor_meta->output_layers_info->bindingIndex;
const char *layerName = tensor_meta->output_layers_info->layerName;
void* map_data = tensor_meta->out_buf_ptrs_host[0];
if(strcmp(layerName,"predictions/Softmax")==0)
{
float* data = (float *)map_data;
fp = fopen(outname, "a");
if(fp)
{
fprintf(fp, "[%d] :: %s %d %f %f %f %f %f %f\n",
frame_meta->frame_num,
layerName, bindingIndex,
data[0], data[1], data[2], data[3], data[4], data[5]);
fclose(fp);
}
}
}
}
#endif
// Writing the same user_meta data to a file 10 times produces identical results after every run
#if 0
for(int i=0; i<10; i++)
{
#if 1
for (l_user = obj_meta->obj_user_meta_list; l_user != NULL;
l_user = l_user->next)
{
NvDsUserMeta *user_meta = (NvDsUserMeta *)l_user->data;
if (user_meta->base_meta.meta_type == NVDSINFER_TENSOR_OUTPUT_META)
{
NvDsInferTensorMeta *tensor_meta =
(NvDsInferTensorMeta *)user_meta->user_meta_data;
/** Holds the TensorRT binding index of the layer. */
int bindingIndex = tensor_meta->output_layers_info->bindingIndex;
const char *layerName = tensor_meta->output_layers_info->layerName;
void* map_data = tensor_meta->out_buf_ptrs_host[0];
if(strcmp(layerName,"predictions/Softmax")==0)
{
float* data = (float *)map_data;
fp = fopen(outname, "a");
if(fp)
{
fprintf(fp, "[%d] :: %s %d %f %f %f %f %f %f\n",
frame_meta->frame_num,
layerName, bindingIndex,
data[0], data[1], data[2], data[3], data[4], data[5]);
fclose(fp);
}
}
}
}
#endif
}
#endif
}
}
return GST_PAD_PROBE_OK;
}
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)
We are using the parallel inference sample provided at GitHub - NVIDIA-AI-IOT/deepstream_parallel_inference_app: A project demonstrating how to use nvmetamux to run multiple models in parallel. to create a pipeline with two detector+classifier branches combined using metamux. The most basic pipeline diagram can be seen below :
The issue we are facing is the inconsistency of the tensor outputs produced by Secondary GIE’s across multiple runs of the same input videos. We write the inference results to a file by accessing batch_meta and user_meta inside body_pose_gie_src_pad_buffer_probe
probe and compare the output across multiple runs. The outputs tend to be identical for a number consecutive frames, followed by a number of consecutive “Same PGIE results but completely different SGIE results” frames. As can be seen in the screenshots below, the Softmax classification outputs are completely different, which leads us to believe it is not just a precision issue but something different.
These inconsistencies repeat multiple times without a specific pattern. I have uploaded the recorded inference results for run1.txt and run2.txt as well.
run1.txt (951.8 KB)
run2.txt (954.5 KB)
Furthermore, after some testing we discovered that introducing a little processing delay into the probe leads to identical PGIE & SGIE results across multiple runs. If we add a dummy for_loop into body_pose_gie_src_pad_buffer_probe
that writes object_user_meta multiple times, the inference results across multiple runs match significantly better or even perfectly.
It would be great if you could look into this issue and let us know why it happens and provide us help in fixing it. Thank you!