Custom Detection parser error with nvinferserver and custom python model with > 1 streams

aviv8 · August 16, 2023, 10:36am

Hello,
I am setting up a DeepStream application (using deepstream-app) to perform inference on a custom RNN model, with the help of triton. I am deploying on GPU using DeepStream 6.3, using docker container.
My triton model is, in fact, an ensemble of a python BLS model (performing some input preprocessing, calling TensorRT model and returning the result), and another python model to postprocess the RNN segmentation mask to a bounding rectangle.
I was setting up the triton model repo without DeepStream at first, and tested it for inference using external python script, with CUDA shared memory. I am also printing the final ensemble model result before returning it for debugging.
The triton model itself seems to be working perfectly fine. I encounter an issue when I am trying to parse my output format to DeepStream NvDsInferObjectDetectionInfo using custom detection parse function.

First of all, I have tried setting output_mem_type : MEMORY_TYPE_CPU in the nvinferserver pbtxt config file. For a single stream the parsing is perfect. For 2 streams, there are some glitches - the “PP Boxes” lines are triton python outputs, all the others are printed in the custom parser:

Rect: nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [2x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [2x1], dataType:3, memType:3
0 0 0 0 0 0.65
Rect: 243 390 62 273 1 0.65
**PERF:  11.28 (11.02)	11.28 (11.34)	
Stream IDS [1, 2]
Converting seg to box torch.Size([2, 256, 256, 1])
PP Boxes tensor([[  0,   0,   0,   0,   0,   0],
        [244,  59, 390, 273, 100,   0]], device='cuda:0', dtype=torch.int32)
Rect: 0 0 0 0 0 0.65
Rect: 238 391 60 274 1 0.65
nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [2x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [2x1], dataType:3, memType:3
Stream IDS [1, 2]
Converting seg to box torch.Size([2, 256, 256, 1])
PP Boxes tensor([[  0,   0,   0,   0,   0,   0],
        [248,  61, 389, 270, 100,   0]], device='cuda:0', dtype=torch.int32)
Rect: 0nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [2x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [2x1], dataType:3, memType:3
 0 0 0 0 0.65
Rect: 244 390 59 273 1 0.65
Stream IDS [1, 2]
Converting seg to box torch.Size([2, 256, 256, 1])
PP Boxes tensor([[  0,   0,   0,   0,   0,   0],
        [254,  66, 389, 267,  99,   0]], device='cuda:0', dtype=torch.int32)
Rect: 0 0 0 0 0 0.65
Rect: 248 389 61 270 1 0.65
nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [2x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [2x1], dataType:3, memType:3
Stream IDS [1, 2]
Converting seg to box torch.Size([2, 256, 256, 1])
PP Boxes tensor([[  0,   0,   0,   0,   0,   0],
        [263,  70, 387, 266,  99,   0]], device='cuda:0', dtype=torch.int32)
Rect: 0 0 0 0 0 0.65
Rect: 254 389 66 267 0.99 nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [2x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [2x1], dataType:3, memType:3
0.65
Stream IDS [1, 2]
Converting seg to box torch.Size([2, 256, 256, 1])
PP Boxes tensor([[  0,   0,   0,   0,   0,   0],
        [264,  70, 387, 266,  99,   0]], device='cuda:0', dtype=torch.int32)
Rect: 0 0 0 0 0 0.65
Rect: 263 387 70 266 0.99 0.65nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [2x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [2x1], dataType:3, memType:3

Stream IDS [1, 2]
Converting seg to box torch.Size([2, 256, 256, 1])
PP Boxes tensor([[  0,   0,   0,   0,   0,   0],
        [262,  69, 387, 267,  99,   0]], device='cuda:0', dtype=torch.int32)
Rect: 0 0 0 0 0 0.65
Rect: 264 387 70 266 0.99 0.65
nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [2x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [2x1], dataType:3, memType:3
Stream IDS [1, 2]
Converting seg to box torch.Size([2, 256, 256, 1])
PP Boxes tensor([[  0,   0,   0,   0,   0,   0],
        [260,  66, 387, 269,  99,   0]], device='cuda:0', dtype=torch.int32)
Rect: 0 0 0 0 0 0.65
Rect: 262 387 69 267 0.99 0.65
nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [2x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [2x1], dataType:3, memType:3
Stream IDS [1, 2]
Converting seg to box torch.Size([2, 256, 256, 1])
PP Boxes tensor([[  0,   0,   0,   0,   0,   0],
        [260,  65, 387, 269,  99,   0]], device='cuda:0', dtype=torch.int32)
Rect: 0 0 0 0 0 0.65
nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [2x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [2x1], dataType:3, memType:3
Rect: 260 387 66 269 0.99 0.65
^C** ERROR: <_intr_handler:140>: User Interrupted.. 

Stream IDS [1, 2]
Converting seg to box torch.Size([2, 256, 256, 1])
PP Boxes tensor([[  0,   0,   0,   0,   0,   0],
        [245,  44, 391, 273, 102,   0]], device='cuda:0', dtype=torch.int32)
Rect: 0 0 0 0 0 0.65
Rect: 0 0 0 0 0 0.65
nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [2x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [2x1], dataType:3, memType:3
Stream IDS [1, 2]
Converting seg to box torch.Size([2, 256, 256, 1])
PP Boxes tensor([[  0,   0,   0,   0,   0,   0],
        [253,  59, 390, 275, 101,   0]], device='cuda:0', dtype=torch.int32)
nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [2x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [2x1], dataType:3, memType:3
Rect: 0 0 0 0 0 0.65
Rect: 0 0 0 0 0 0.65
Stream IDS [1, 2]
Converting seg to box torch.Size([2, 256, 256, 1])
PP Boxes tensor([[  0,   0,   0,   0,   0,   0],
        [256,  58, 389, 274, 100,   0]], device='cuda:0', dtype=torch.int32)
nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [2x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [2x1], dataType:3, memType:3
Rect: 1052293305 1052819650 1054398682 1052819650 1.054e+07 0.65
Rect: 1052293305 1052819650 1054398682 1052819650 1.054e+07 0.65
Stream IDS [1, 2]
Converting seg to box torch.Size([2, 256, 256, 1])
PP Boxes tensor([[  0,   0,   0,   0,   0,   0],
        [254,  55, 389, 275, 100,   0]], device='cuda:0', dtype=torch.int32)
Rect: nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [2x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [2x1], dataType:3, memType:3
1052293305 1052819650 1054398682 1052819650 1.054e+07 0.65
Rect: 1052293305 1052819650 1054398682 1052819650 1.054e+07 0.65
**PERF:  11.37 (11.16)	11.37 (11.43)	
Quitting
Stream IDS [1, 2]
Converting seg to box torch.Size([2, 256, 256, 1])
PP Boxes tensor([[  0,   0,   0,   0,   0,   0],
        [250,  52, 389, 277, 101,   0]], device='cuda:0', dtype=torch.int32)
nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [2x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [2x1], dataType:3, memType:3
Rect: 1052293305 1052819650 1054398682 1052819650 1.054e+07 0.65
Rect: 1052293305 1052819650 1054398682 1052819650 1.054e+07 0.65
[NvMultiObjectTracker] De-initialized
Stream IDS [1, 2]
Converting seg to box torch.Size([2, 256, 256, 1])
PP Boxes tensor([[  0,   0,   0,   0,   0,   0],
        [249,  49, 390, 277, 101,   0]], device='cuda:0', dtype=torch.int32)
Rect: 1052293305 1052819650 1054398682 1052819650 1.054e+07 0.65
Rect: 1052293305 1052819650 1054398682 1052819650 1.054e+07 0.65
nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [2x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [2x1], dataType:3, memType:3

The last lines, for example, are “bad” parsed values, but the others are completly fine!
2. Afterwards, I have tried modifying to output_mem_type : MEMORY_TYPE_GPU. I have spotted a very weird behavior - DeepStream runs for a few seconds, parses the first values well, and then gets stuck - egl sink freezed and no additional outputs on the terminal. This is the log:

nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [2x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [2x1], dataType:3, memType:3
nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [1x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [1x1], dataType:3, memType:3
nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [1x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [1x1], dataType:3, memType:3
Stream IDS [1, 2]
Stream IDS [2]
Converting seg to box torch.Size([2, 256, 256, 1])
PP Boxes tensor([[0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0]], device='cuda:0', dtype=torch.int32)
nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [2x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [2x1], dataType:3, memType:3
PostProcessBoxesOutput
Rect: 0 0 0 0 0 0.65
Rect: 0 0 0 0 0 0.65
Stream IDS [1]
Converting seg to box torch.Size([1, 256, 256, 1])
PP Boxes tensor([[0, 0, 0, 0, 0, 0]], device='cuda:0', dtype=torch.int32)
nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [2x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [2x1], dataType:3, memType:3
Rect: 0 0 0 0 0 0.65
Stream IDS [1, 2]
Converting seg to box torch.Size([1, 256, 256, 1])
PP Boxes tensor([[0, 0, 0, 0, 0, 0]], device='cuda:0', dtype=torch.int32)
nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [2x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [2x1], dataType:3, memType:3
Stream IDS [1, 2]
Converting seg to box torch.Size([2, 256, 256, 1])
PP Boxes tensor([[0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0]], device='cuda:0', dtype=torch.int32)
nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [2x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [2x1], dataType:3, memType:3
Stream IDS [1, 2]
Converting seg to box torch.Size([2, 256, 256, 1])
PP Boxes tensor([[0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0]], device='cuda:0', dtype=torch.int32)
nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [2x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [2x1], dataType:3, memType:3
**PERF:  0.00 (0.00)	13486.51 (3.97)	**<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<**
Stream IDS [1, 2]
Converting seg to box torch.Size([2, 256, 256, 1])
PP Boxes tensor([[0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0]], device='cuda:0', dtype=torch.int32)
nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [2x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [2x1], dataType:3, memType:3
Stream IDS [1, 2]
Converting seg to box torch.Size([2, 256, 256, 1])
PP Boxes tensor([[0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0]], device='cuda:0', dtype=torch.int32)
nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [2x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [2x1], dataType:3, memType:3
Stream IDS [1, 2]
Converting seg to box torch.Size([2, 256, 256, 1])
PP Boxes tensor([[0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0]], device='cuda:0', dtype=torch.int32)
Stream IDS [1, 2]
**PERF:  0.00 (0.00)	0.00 (0.80)	
**PERF:  0.00 (0.00)	0.00 (0.44)

It seems like the video is running super fast - Watch the marked line (marked with <<<<<<<<<<<<< in the log) - the perf log is very high, while first perf measurement on CPU is **PERF: 11.62 (7.76) 17.44 (11.63) for example! I believe this issue is due to faulty memory access in the parser - Because I have tried adressing GPU buffer like a CPU one.
3. So I kept using output_mem_type : MEMORY_TYPE_GPU, Just with cudaMemCpy (v2 of the parser). The result is basically the same as (1) - Some values are just bad, but most seem valid.

My guess - some kind of memory dereferencing occurs, or otherwise, perhaps there is a race condition or memory re-usage that causes this issue.
parserV1.cpp (4.5 KB)
parserV2.cpp (4.8 KB)
My nvinferserver pbtxt config:
config_triton_inferserver_primary_smoke.pbtxt (1.6 KB)
My ensemble config:
config.pbtxt (1.1 KB)
My postprocess config and model.py (2nd ensemble model):
model.py (6.7 KB)
config.pbtxt (311 Bytes)

Thanks.

fanzh · August 17, 2023, 7:06am

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)

• DeepStream Version

• JetPack Version (valid for Jetson only)

• TensorRT Version

• NVIDIA GPU Driver Version (valid for GPU only)

• Issue Type( questions, new requirements, bugs)

• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)

• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

aviv8 · August 20, 2023, 7:08am

Hello,
We’re using DeepStream 6.3 Docker container (latest), Attaching Dockerfile and requirements(-base)?.txt to setup the environment.
I am using x86_64 mahcine with RTX A4000 on Ubuntu 22.04, Local GPU Driver version 535 latest (CUDA 12.2). Docker is also latest version.
this is a bug. Please note my first issue. I have attached the model configuration and I believe it can be reproduced from it and from the last postprocessor model.py code.
requirements.txt (29 Bytes)
requirements-base.txt (223 Bytes)
Dockerfile (1.2 KB)
Please note - “deepstream-deploy” is an internal package. You may ignore this requirement, it is a CLI tool only.

fanzh · August 21, 2023, 2:53pm

please help to reproduce this issue. if possbile, please share the whole project including the simplified code, input video, models, configuration files. you can share it with the forum private email.
nvinferserver and triton are opensource, you can add log to check if interested.

aviv8 · August 21, 2023, 3:12pm

I will build a docker container and post a pull link ASAP.

aviv8 · August 24, 2023, 1:01pm

@fanzh
I built and pushed an image to recreate the image.
Use the following commands:

xhost + # for display output using eglsink
docker run -it --gpus all --shm-size '2gb' -v /tmp/.X11-unix:/tmp/.X11-unix --ipc host --privileged -e DISPLAY=${DISPLAY} --entrypoint /smoke/run.sh public.ecr.aws/d7v4u7y1/captain-eye-pub-tests:mock-triton

fanzh · August 24, 2023, 3:09pm

I can’t start the docker , here is the log:
docker run --gpus device=0 --name fan-user -itd --shm-size ‘2gb’ -v /tmp/.X11-unix:/tmp/.X11-unix --ipc host --privileged -e DISPLAY=${DISPLAY} --entrypoint /smoke/run.sh public.ecr.aws/d7v4u7y1/captain-eye-pub-tests:mock-triton
docker exec -it 8691fd5f147cb4f95d6ffbbd1748ab3df334377fdfb2f36245820fada0cd1a39 bash
Error response from daemon: Container 8691fd5f147cb4f95d6ffbbd1748ab3df334377fdfb2f36245820fada0cd1a39 is not running

fanzh · August 25, 2023, 7:40am

please see the last comment, and there is an ensemble sample in DS6.3 opt\nvidia\deepstream\deepstream-6.3\sources\TritonBackendEnsemble, can you modify this same to reproduce the issue?

aviv8 · August 27, 2023, 5:39am

The container runs just like that, perfectly fine.
Did you run xhost + to allow X11 connection? Are you running on Linux?
The container is probably crashing due to some issue. Check out with -it in the docker run arguments.

fanzh · August 28, 2023, 1:53am

after do “xhost +” and use “-it”, I run it again. here is the error log,
error-0828.txt (1.8 KB)
can you use sample TritonBackendEnsemble to reproduce this issue?

aviv8 · August 29, 2023, 8:10am

I am trying to modify the example, and will update you ASAP.

aviv8 · August 29, 2023, 1:06pm

@fanzh The problem now seems to be caused by enabling the sequence batcher in triton for the main TensorRT model (segmentation model).
I have commented out sequence_batching section in config.pbtxt file of the model and it didn’t get stuck. Re-enbabling it caused DeepStream to freeze and triton to crash with a “Segmentation fault (core dumped)” error.
My model config.pbtxt:

name : "topdown"
platform : "tensorrt_plan"
max_batch_size: 2
default_model_filename : "smoke333.onnx_fp32_min1_opt2_max2.engine"

sequence_batching {
  oldest
    {
      max_candidate_sequences: 4
    }
  state: [
    {
      input_name: "PreviousState"
      output_name: "leaky_re_lu_607"
      data_type: TYPE_FP32
      dims: [ 128, 128, 160 ]
      initial_state: {
       data_type: TYPE_FP32
       dims: [ 128, 128, 160 ]
       zero_data: true
       name: "initial state"
      }
    }
  ]
}

parameters: {
  key: "FORCE_CPU_ONLY_INPUT_TENSORS"
  value: {
    string_value:"no"
  }
}

My model spec is:

All model configurations are the same, I have commented out the C++ code to parse bboxes, and the txt configs:
config_infer_secondary_ensemble.txt (2.8 KB)
deepstream_app_config_triton_backend_ensemble.txt (5.4 KB)

fanzh · August 29, 2023, 3:27pm

Thanks for the sharing.

what is the model used to do?
why is config.pbtxt’s setting is inconsistent with model 's layer name？
please help to reproduce this issue, please provide the simplified code, model, configuration and input data. Thanks!

fanzh · August 30, 2023, 3:47am

python can’t process gpu data, setting “output_mem_type : MEMORY_TYPE_GPU” is not reasonable.

aviv8 · August 30, 2023, 8:37am

I’d like to point out at first, that my goal is to use a semantic segmentation NN with custom inputs to detect boxes, so I am using triton ensemble with bls to use the NN.
The triton pipeline is:

Ensemble:
– Smoke BLS (Preprocessing, input construction, calling NN running with TRT backend using BLS)
– Postprocess

The model provided in the screenshot is a semantic segmentation model. The postprocessor model converts the masks to boxes (x,y,x,y,conf,class).

The segmentation model gets norrmalized RGBRGB 6-channel image with first 3 channels “current frames” and last 3 channels “background frame”.
The segmenatation model has LSTM loop (leaky_re_lu_607 output layer is fed back to PreviousState output layer).
It also has a control vector (InitVector) layer that changes the model’s behavior (It’s content is either full of zeros or first item 0.5 and the rest is zeros). BLS Model changes the content of that vector.
The output layer is (H,W,CLASS_CONF). The output layer is a “scaled” output mask. This example is a 1-class-segmentation model!

I did not attach the config.pbtxt of the NN - because it has nothing beyond the model file name setting (dims are inferred). See below the full modified DS6.3 opt\nvidia\deepstream\deepstream-6.3\sources\TritonBackendEnsemble example archive.
Here’s the example mentioned just above as you asked me to construct:
TritonBackendEnsemble.tar.gz (72.3 KB)
Note! I have removed the NN onnx/engine files from the archive because it’s a proprietary NN. You might want to create a mock model instead.

aviv8 · August 30, 2023, 8:41am

@fanzh Python seems to be working perfectly fine with CUDA shared memory and dlpack, using torch. Am I wrong here?
Attaching Relevant Triton Python Backend README section:

For PyTorch 2.0:

Thanks!

fanzh · September 2, 2023, 3:49pm

Sorry for the late replay. thanks for the sharing. I checked the code. but it is hard to get the root cause if can’t reproduce and debug.

the workflow is “model process-> python postprocessor->custom_parse_bbox_func”. can you check if the python postprocessor’s output data is correct? if it is correct. nvifnerserver plugin and lowlevel are opensource. you can add log to debug.

nvinfersever will let trtion do inference by calling triton api. nvinferserver and triton are opensource. can you narrow down this issue by adding log and simplifying code?

aviv8 · September 4, 2023, 8:14am

@fanzh
I got Deep Stream + Triton + Custom parser to work perfectly fine.

I built the *.engine file with --fp16 flag, but apparently my build result was damaged, Switching to non-fp16 engine fixed the inference issue - the results inside triton were inconsistent. Despite the fact that I was building using TRTEXEC with fp16, the input/output layers’ dtype was still fp32, and then I switched to fp32 only.
The timing issue was resolved. I don’t know what the issue was exactly, but using a minimally modified configuration of the provided TritonBackendEnsemble resolved the issue completely.
Inference now worksd perfectly e2e including multiurisourcebin, REST, tracker and everything.
As I mentioned above, setting: parameters: { key: "FORCE_CPU_ONLY_INPUT_TENSORS" value: { string_value:"no" } } in the python backend model config.pbtxt file, WAS changing the device of the tensorrs from cpu to cuda:0. So that should be noted as well.

Anyway, Thanks a lot for your help. I believe this thread may be closed now, and I’d love to send you any configuration that may be relevant if you’d like to investigate further anyway.

system · September 18, 2023, 8:14am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Python app triton inference Deepstream 5.1 DeepStream SDK	10	998	March 5, 2022
Nvinferserver apps crashing just by importing torch DeepStream SDK inference-server-triton	8	687	February 22, 2024
DeepStream 7.1 nvinferserver tensor clone error DeepStream SDK deepstream	12	90	November 29, 2024
DeepStream 6.0.1 Triton GRPC memory leak DeepStream SDK nvbugs	23	2762	September 2, 2022
Order within triton inference server python backend DeepStream SDK python , inference-server-triton , deepstream	31	1309	May 6, 2024
Utilizing Inference server for multi-batch processing with deepstream DeepStream SDK gstreamer , inference-server-triton , deepstream61	11	1151	October 19, 2023
Source ID input to the Triton Inference Server from the nvinferserver plugin DeepStream SDK gstreamer , deepstream	19	101	September 24, 2024
DeepStream SSD parser example stucks DeepStream SDK inference-server-triton	2	586	May 3, 2022
GRPC Data Corruption/Issue with Yolo Object Detection with Triton on Jetson DeepStream SDK	20	681	June 25, 2024
Detections change in Deepstream 6.2 DeepStream SDK tensorrt , camera , cuda , kernel , gstreamer , yolo , python , deepstream , deepstream61	21	1746	July 31, 2023

Custom Detection parser error with nvinferserver and custom python model with > 1 streams

Related topics