No Tensor Data from nvinferserver when using ReIdentificationNet from Triton

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)
GPU

• DeepStream Version
6.2.2

• TensorRT Version
Triton Container: nvcr.io/nvidia/tritonserver:23.02-py3
Deepstream Container: nvcr.io/nvidia/deepstream:6.2-triton

• NVIDIA GPU Driver Version (valid for GPU only)
525.125.06

• Issue Type( questions, new requirements, bugs)
Bug/Configuration

I’m hosting ReIdentificationNet in Triton (config below), and using nvinferserver from DeepStream as a secondary infer where the PGIE is from an nvinfer. According to the samples, I’m getting the tensor metadata in my C++ app by going to the NvDsUserMeta. eg:

NvDsInferTensorMeta *tensor_meta = (NvDsInferTensorMeta ) user_meta->user_meta_data;
NvDsInferDims embedding_dims = tensor_meta->output_layers_info[0].inferDims;
int numElements = embedding_dims.d[0];
float
embedding_data = (float *)(tensor_meta->out_buf_ptrs_dev[0]);

From here, I work with the embedding_data, etc. just fine.

My problem is that when I secondary-inference through Triton (nvinferserver), the embedding_data is NULL, even though the numElements = 256. When I run this exact same code using a (local) nvinfer SGIE instead, everything works fine, and embedding_data is valid; its only when I switch to Triton/nvinferserver that the embedding_data is NULL.

If I use nvinferserver for SGIE, do I also need to use nvinferserver for the PGIE? eg can I mix nvinfer and nvinferserver in the same pipeline?

nvinferserver config:
infer_config {
unique_id: 2
gpu_ids: 0
max_batch_size: 16
backend {
triton {
model_name: “reidentificationnet”
version: -1
grpc {
url: “triton-server:8001”
enable_cuda_buffer_sharing: true
}
}
}

preprocess {
network_format: IMAGE_FORMAT_RGB
tensor_order: TENSOR_ORDER_LINEAR
tensor_name: “input”
maintain_aspect_ratio: 0
frame_scaling_hw: FRAME_SCALING_HW_DEFAULT
frame_scaling_filter: 1
normalize {
scale_factor: 0.01735207357279195
channel_offsets: [123.675,116.28,103.53]
}
}

postprocess {
other {}
}

extra {
copy_input_to_host_buffers: false
output_buffer_pool_size: 64
}
}

input_control {
process_mode: PROCESS_MODE_CLIP_OBJECTS
operate_on_gie_id: 1
interval: 0
operate_on_class_ids: [0,1,2]
}

output_control {
output_tensor_meta: true
}

Triton config.pbtxt:
name: “reidentificationnet”
platform: “tensorrt_plan”
default_model_filename: “resnet50_market1501.etlt_b16_gpu0_fp16.engine”
max_batch_size: 16
input [
{
name: “input”
data_type: TYPE_FP32
format: FORMAT_NCHW
dims: [ 3, 256, 128 ]
}
]
output [
{
name: “fc_pred”
data_type: TYPE_FP32
dims: [ 256 ]
}
]
dynamic_batching {
max_queue_delay_microseconds: 40000
}

Pipeline (works with nvinfer SGIE):
. . . ! nvinfer config=facenet ! nvtracker ! nvinfer config=reid ! nvdsosd !

Pipeline (doesn’t work with nvinferserver SGIE)
. . . ! nvinfer config=facenet ! nvtracker ! nvinferserver config=reid ! nvdsosd !

yes, you can mix nvinver and nvinerserver, please refer to this sample. the app can support nvinfer or nvinferserver for PGIE and SGIE. the PGIE detects objects, the SGIE extracts the embedding vector out of every detection bounding box.

For reference, the FaceNet PGIE configs are below. I’ve tried FaceNet using nvinfer, and within Triton using nvinferserver.

PGIE FaceNet nvinfer config
[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
tlt-model-key=nvidia_tlt
tlt-encoded-model=/var/config/models/faciallandmark/facenet.etlt
labelfile-path=/var/config/models/faciallandmark/labels_facenet.txt
int8-calib-file=/var/config/models/faciallandmark/facenet_cal.txt
model-engine-file=/var/config/models/faciallandmark/facenet.etlt_b1_gpu0_int8.engine
infer-dims=3;416;736
uff-input-order=0
uff-input-blob-name=input_1
batch-size=1
process-mode=1
model-color-format=0
network-mode=1
num-detected-classes=1
cluster-mode=2
interval=0
gie-unique-id=1
output-blob-names=output_bbox/BiasAdd;output_cov/Sigmoid

[class-attrs-all]
pre-cluster-threshold=0.2
group-threshold=1
eps=0.2

PGIE FaceNet nvinferserver config

infer_config {
unique_id: 1
gpu_ids: [0]
max_batch_size: 16
backend {
triton {
model_name: “facenet”
version: -1
grpc {
url: “triton-server:8001”
enable_cuda_buffer_sharing: true
}
}
}

preprocess {
network_format: IMAGE_FORMAT_RGB
tensor_order: TENSOR_ORDER_LINEAR
tensor_name: “input_1”
frame_scaling_hw: FRAME_SCALING_HW_DEFAULT
frame_scaling_filter: 1
normalize {
scale_factor: 0.0039215697906911373
channel_offsets: [0, 0, 0]
}
}
postprocess {
labelfile_path: “/data-volume/triton/models/facenet/labels.txt”
detection {
num_detected_classes: 1
nms {
confidence_threshold:0.1
topk:20
iou_threshold:0.1
}
}
}

extra {
copy_input_to_host_buffers: false
output_buffer_pool_size: 10
}
}

input_control {
process_mode: PROCESS_MODE_FULL_FRAME
operate_on_gie_id: -1
interval: 0
}

gst-nvdsinferserver(DS-Triton) support 2 Triton mode.
a. CAPI, it start the triton server lib directly with C-API calls without any buffer-copy, no extra tritonserver app needed.
b. gRPC, the plugin is a client and request inference to tritonserver app which start in another container.
From your config file, You can running on gRPC mode. If you are expecting the model inference on a single machine.
To isolate whether the issue is caused by gRPC. Please try Triton CAPI mode. And CAPI could have better performance on single machine.

  model_repo {  # enable C-API
    root: "./triton_model_repo" # tritonserver's model-repo
    log_level: 4
    strict_model_config: true
  }
  # grpc {  
  #  url: “triton-server:8001”
  #  enable_cuda_buffer_sharing: true
  #}

Also, try disable dynamic_batching which might cause high latency in CAPI specifically for SGIE

When I try the C-API method, there appears to be a version mismatch (see below). Could there be an issue between my DeepStream container (nvcr.io/nvidia/deepstream:6.2-triton), and the TAO version that was used to build the .engine file (v3.22.05_trt8.4_x86) or the Triton container?

I0830 18:53:10.563203 4591 tensorrt.cc:5640] TRITONBACKEND_ModelInstanceInitialize: reidentificationnet (GPU device 0)
I0830 18:53:10.566117 4591 backend_model_instance.cc:105] Creating instance reidentificationnet on GPU 0 (8.0) using artifact ‘resnet50_market1501.etlt_b16_gpu0_fp16.engine’
I0830 18:53:10.567063 4591 tensorrt.cc:1622] Zero copy optimization is disabled
I0830 18:53:10.680595 4591 logging.cc:49] Loaded engine size: 46 MiB
E0830 18:53:10.828182 4591 logging.cc:43] 1: [stdArchiveReader.cpp::StdArchiveReader::42] Error Code 1: Serialization (Serialization assertion stdVersionRead == serializationVersion failed.Version tag does not match. Note: Current Version: 232, Serialized Engine Version: 213)
E0830 18:53:10.837706 4591 logging.cc:43] 4: [runtime.cpp::deserializeCudaEngine::66] Error Code 4: Internal Error (Engine deserialization failed.)

this is because of TensorRT version mismatch issue. please create a new engine file in DeepStream container if using triton-CAPI mode.

nvinferserver plugin is opensource. In attachTensorOutputMeta of gstnvinferserver_meta_utils.cpp, this out_buf_ptrs_dev is set to null, you can use out_buf_ptrs_host instead.

Thank you, fanzh. This works for me.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.