Hello,
I am setting up a DeepStream application (using deepstream-app) to perform inference on a custom RNN model, with the help of triton. I am deploying on GPU using DeepStream 6.3, using docker container.
My triton model is, in fact, an ensemble of a python BLS model (performing some input preprocessing, calling TensorRT model and returning the result), and another python model to postprocess the RNN segmentation mask to a bounding rectangle.
I was setting up the triton model repo without DeepStream at first, and tested it for inference using external python script, with CUDA shared memory. I am also printing the final ensemble model result before returning it for debugging.
The triton model itself seems to be working perfectly fine. I encounter an issue when I am trying to parse my output format to DeepStream NvDsInferObjectDetectionInfo using custom detection parse function.
- First of all, I have tried setting
output_mem_type : MEMORY_TYPE_CPU
in the nvinferserver pbtxt config file. For a single stream the parsing is perfect. For 2 streams, there are some glitches - the âPP Boxesâ lines are triton python outputs, all the others are printed in the custom parser:
Rect: nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [2x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [2x1], dataType:3, memType:3
0 0 0 0 0 0.65
Rect: 243 390 62 273 1 0.65
**PERF: 11.28 (11.02) 11.28 (11.34)
Stream IDS [1, 2]
Converting seg to box torch.Size([2, 256, 256, 1])
PP Boxes tensor([[ 0, 0, 0, 0, 0, 0],
[244, 59, 390, 273, 100, 0]], device='cuda:0', dtype=torch.int32)
Rect: 0 0 0 0 0 0.65
Rect: 238 391 60 274 1 0.65
nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [2x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [2x1], dataType:3, memType:3
Stream IDS [1, 2]
Converting seg to box torch.Size([2, 256, 256, 1])
PP Boxes tensor([[ 0, 0, 0, 0, 0, 0],
[248, 61, 389, 270, 100, 0]], device='cuda:0', dtype=torch.int32)
Rect: 0nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [2x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [2x1], dataType:3, memType:3
0 0 0 0 0.65
Rect: 244 390 59 273 1 0.65
Stream IDS [1, 2]
Converting seg to box torch.Size([2, 256, 256, 1])
PP Boxes tensor([[ 0, 0, 0, 0, 0, 0],
[254, 66, 389, 267, 99, 0]], device='cuda:0', dtype=torch.int32)
Rect: 0 0 0 0 0 0.65
Rect: 248 389 61 270 1 0.65
nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [2x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [2x1], dataType:3, memType:3
Stream IDS [1, 2]
Converting seg to box torch.Size([2, 256, 256, 1])
PP Boxes tensor([[ 0, 0, 0, 0, 0, 0],
[263, 70, 387, 266, 99, 0]], device='cuda:0', dtype=torch.int32)
Rect: 0 0 0 0 0 0.65
Rect: 254 389 66 267 0.99 nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [2x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [2x1], dataType:3, memType:3
0.65
Stream IDS [1, 2]
Converting seg to box torch.Size([2, 256, 256, 1])
PP Boxes tensor([[ 0, 0, 0, 0, 0, 0],
[264, 70, 387, 266, 99, 0]], device='cuda:0', dtype=torch.int32)
Rect: 0 0 0 0 0 0.65
Rect: 263 387 70 266 0.99 0.65nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [2x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [2x1], dataType:3, memType:3
Stream IDS [1, 2]
Converting seg to box torch.Size([2, 256, 256, 1])
PP Boxes tensor([[ 0, 0, 0, 0, 0, 0],
[262, 69, 387, 267, 99, 0]], device='cuda:0', dtype=torch.int32)
Rect: 0 0 0 0 0 0.65
Rect: 264 387 70 266 0.99 0.65
nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [2x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [2x1], dataType:3, memType:3
Stream IDS [1, 2]
Converting seg to box torch.Size([2, 256, 256, 1])
PP Boxes tensor([[ 0, 0, 0, 0, 0, 0],
[260, 66, 387, 269, 99, 0]], device='cuda:0', dtype=torch.int32)
Rect: 0 0 0 0 0 0.65
Rect: 262 387 69 267 0.99 0.65
nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [2x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [2x1], dataType:3, memType:3
Stream IDS [1, 2]
Converting seg to box torch.Size([2, 256, 256, 1])
PP Boxes tensor([[ 0, 0, 0, 0, 0, 0],
[260, 65, 387, 269, 99, 0]], device='cuda:0', dtype=torch.int32)
Rect: 0 0 0 0 0 0.65
nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [2x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [2x1], dataType:3, memType:3
Rect: 260 387 66 269 0.99 0.65
^C** ERROR: <_intr_handler:140>: User Interrupted..
Stream IDS [1, 2]
Converting seg to box torch.Size([2, 256, 256, 1])
PP Boxes tensor([[ 0, 0, 0, 0, 0, 0],
[245, 44, 391, 273, 102, 0]], device='cuda:0', dtype=torch.int32)
Rect: 0 0 0 0 0 0.65
Rect: 0 0 0 0 0 0.65
nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [2x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [2x1], dataType:3, memType:3
Stream IDS [1, 2]
Converting seg to box torch.Size([2, 256, 256, 1])
PP Boxes tensor([[ 0, 0, 0, 0, 0, 0],
[253, 59, 390, 275, 101, 0]], device='cuda:0', dtype=torch.int32)
nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [2x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [2x1], dataType:3, memType:3
Rect: 0 0 0 0 0 0.65
Rect: 0 0 0 0 0 0.65
Stream IDS [1, 2]
Converting seg to box torch.Size([2, 256, 256, 1])
PP Boxes tensor([[ 0, 0, 0, 0, 0, 0],
[256, 58, 389, 274, 100, 0]], device='cuda:0', dtype=torch.int32)
nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [2x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [2x1], dataType:3, memType:3
Rect: 1052293305 1052819650 1054398682 1052819650 1.054e+07 0.65
Rect: 1052293305 1052819650 1054398682 1052819650 1.054e+07 0.65
Stream IDS [1, 2]
Converting seg to box torch.Size([2, 256, 256, 1])
PP Boxes tensor([[ 0, 0, 0, 0, 0, 0],
[254, 55, 389, 275, 100, 0]], device='cuda:0', dtype=torch.int32)
Rect: nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [2x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [2x1], dataType:3, memType:3
1052293305 1052819650 1054398682 1052819650 1.054e+07 0.65
Rect: 1052293305 1052819650 1054398682 1052819650 1.054e+07 0.65
**PERF: 11.37 (11.16) 11.37 (11.43)
Quitting
Stream IDS [1, 2]
Converting seg to box torch.Size([2, 256, 256, 1])
PP Boxes tensor([[ 0, 0, 0, 0, 0, 0],
[250, 52, 389, 277, 101, 0]], device='cuda:0', dtype=torch.int32)
nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [2x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [2x1], dataType:3, memType:3
Rect: 1052293305 1052819650 1054398682 1052819650 1.054e+07 0.65
Rect: 1052293305 1052819650 1054398682 1052819650 1.054e+07 0.65
[NvMultiObjectTracker] De-initialized
Stream IDS [1, 2]
Converting seg to box torch.Size([2, 256, 256, 1])
PP Boxes tensor([[ 0, 0, 0, 0, 0, 0],
[249, 49, 390, 277, 101, 0]], device='cuda:0', dtype=torch.int32)
Rect: 1052293305 1052819650 1054398682 1052819650 1.054e+07 0.65
Rect: 1052293305 1052819650 1054398682 1052819650 1.054e+07 0.65
nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [2x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [2x1], dataType:3, memType:3
The last lines, for example, are âbadâ parsed values, but the others are completly fine!
2. Afterwards, I have tried modifying to output_mem_type : MEMORY_TYPE_GPU
. I have spotted a very weird behavior - DeepStream runs for a few seconds, parses the first values well, and then gets stuck - egl sink freezed and no additional outputs on the terminal. This is the log:
nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [2x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [2x1], dataType:3, memType:3
nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [1x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [1x1], dataType:3, memType:3
nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [1x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [1x1], dataType:3, memType:3
Stream IDS [1, 2]
Stream IDS [2]
Converting seg to box torch.Size([2, 256, 256, 1])
PP Boxes tensor([[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]], device='cuda:0', dtype=torch.int32)
nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [2x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [2x1], dataType:3, memType:3
PostProcessBoxesOutput
Rect: 0 0 0 0 0 0.65
Rect: 0 0 0 0 0 0.65
Stream IDS [1]
Converting seg to box torch.Size([1, 256, 256, 1])
PP Boxes tensor([[0, 0, 0, 0, 0, 0]], device='cuda:0', dtype=torch.int32)
nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [2x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [2x1], dataType:3, memType:3
Rect: 0 0 0 0 0 0.65
Stream IDS [1, 2]
Converting seg to box torch.Size([1, 256, 256, 1])
PP Boxes tensor([[0, 0, 0, 0, 0, 0]], device='cuda:0', dtype=torch.int32)
nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [2x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [2x1], dataType:3, memType:3
Stream IDS [1, 2]
Converting seg to box torch.Size([2, 256, 256, 1])
PP Boxes tensor([[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]], device='cuda:0', dtype=torch.int32)
nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [2x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [2x1], dataType:3, memType:3
Stream IDS [1, 2]
Converting seg to box torch.Size([2, 256, 256, 1])
PP Boxes tensor([[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]], device='cuda:0', dtype=torch.int32)
nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [2x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [2x1], dataType:3, memType:3
**PERF: 0.00 (0.00) 13486.51 (3.97) **<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<**
Stream IDS [1, 2]
Converting seg to box torch.Size([2, 256, 256, 1])
PP Boxes tensor([[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]], device='cuda:0', dtype=torch.int32)
nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [2x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [2x1], dataType:3, memType:3
Stream IDS [1, 2]
Converting seg to box torch.Size([2, 256, 256, 1])
PP Boxes tensor([[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]], device='cuda:0', dtype=torch.int32)
nvdsinferserver_custom_process.cpp:183extraInputProcess: primary input *SmokeBlsImageInput*, shape: [2x512x512x3], dataType:0, memType:1
nvdsinferserver_custom_process.cpp:184extraInputProcess: extra input *SmokeBlsCorrelationIdsInput*, shape: [2x1], dataType:3, memType:3
Stream IDS [1, 2]
Converting seg to box torch.Size([2, 256, 256, 1])
PP Boxes tensor([[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]], device='cuda:0', dtype=torch.int32)
Stream IDS [1, 2]
**PERF: 0.00 (0.00) 0.00 (0.80)
**PERF: 0.00 (0.00) 0.00 (0.44)
It seems like the video is running super fast - Watch the marked line (marked with <<<<<<<<<<<<< in the log) - the perf log is very high, while first perf measurement on CPU is **PERF: 11.62 (7.76) 17.44 (11.63)
for example! I believe this issue is due to faulty memory access in the parser - Because I have tried adressing GPU buffer like a CPU one.
3. So I kept using output_mem_type : MEMORY_TYPE_GPU
, Just with cudaMemCpy (v2 of the parser). The result is basically the same as (1) - Some values are just bad, but most seem valid.
My guess - some kind of memory dereferencing occurs, or otherwise, perhaps there is a race condition or memory re-usage that causes this issue.
parserV1.cpp (4.5 KB)
parserV2.cpp (4.8 KB)
My nvinferserver pbtxt config:
config_triton_inferserver_primary_smoke.pbtxt (1.6 KB)
My ensemble config:
config.pbtxt (1.1 KB)
My postprocess config and model.py (2nd ensemble model):
model.py (6.7 KB)
config.pbtxt (311 Bytes)
Thanks.