But when trying to run this model with jetson.inference.detectNet in python (I made some change in the source code to use the GPU, with FP16 => working well with original ssd_mobilenet_v2_coco.uff), tensorRT doesn’t want to run the inferences with the ONNX model (I also tried INT8 and FP32 without success) :
[TRT] device GPU, completed writing engine cache to /usr/local/bin/networks/SSD-Mobilenet-v1-ONNX/ssd-mobilenet.onnx.1.0.7100.GPU.FP1 6.engine
[TRT] device GPU, loaded /usr/local/bin/networks/SSD-Mobilenet-v1-ONNX/ssd-mobilenet.onnx
[TRT] Deserialize required 123757 microseconds.
[TRT]
[TRT] CUDA engine context initialized on device GPU:
[TRT] – layers 97
[TRT] – maxBatchSize 1
[TRT] – workspace 0
[TRT] – deviceMemory 20092416
[TRT] – bindings 3
[TRT] binding 0
– index 0
– name ‘input_0’
– type FP32
– in/out INPUT
– # dims 4
– dim #0 1 (SPATIAL)
– dim #1 3 (SPATIAL)
– dim #2 300 (SPATIAL)
– dim #3 300 (SPATIAL)
[TRT] binding 1
– index 1
– name ‘scores’
– type FP32
– in/out OUTPUT
– # dims 3
– dim #0 1 (SPATIAL)
– dim #1 3000 (SPATIAL)
– dim #2 2 (SPATIAL)
[TRT] binding 2
– index 2
– name ‘boxes’
– type FP32
– in/out OUTPUT
– # dims 3
– dim #0 1 (SPATIAL)
– dim #1 3000 (SPATIAL)
– dim #2 4 (SPATIAL)
[TRT] [TRT] INVALID_ARGUMENT: Cannot find binding of given name: Input [TRT] failed to find requested input layer Input in network [TRT] device GPU, failed to create resources for CUDA engine [TRT] failed to create TensorRT engine for /usr/local/bin/networks/SSD-Mobilenet-v1-ONNX/ssd-mobilenet.onnx, device GPU [TRT] detectNet – failed to initialize.
Any idea of what is wrong ? The model runs successfully with detectnet when using the cpp version but uses the CPU instead of GPU
I didn’t run the python script, I implemented the function in another script with no args (values to pass are in the script).
I also made some changes in the library, I will upload the files soon.
Now I call the functions with :
labels = open(“jetson-inference/data/networks/SSD-Mobilenet-v1-ONNX/labels.txt”).readlines()
net = jetson.inference.detectNet(“ssd-mobilenet-v1-onnx”, threshold=0.7, precision=“FP16”, device=“GPU”, allowGPUFallback=True)
@Pelepicier, I am unable to debug all the changes you made. I recommend going back to the original jetson-inference code and creating your model like this:
net = jetson.inference.detectNet(argv=['--model=my_model_path/ssd-mobilenet.onnx',
'--labels=my_model_path/labels.txt',
'--input-blob=input_0', '--output-cvg=scores', '--output-bbox=boxes',
threshold=0.5)
This will use the parsing already in detectNet and should be working.
Thank you this is exactly what I needed.
Now I can get 250FPS with my custom retrained ONNX model with only “Person” label (thanks to your scripts in jetson-inference). I use GPU+DLA_0+DLA_1 with multiprocessing (in python), and just needed to changed 8 lines in c/detectNet.cpp to make it work (to pass the device and precision in args).
I’m trying to have a little bit more FPS, I was wondering : what does the batch_size parameter change in the engine ? It does not seems to have consequences on the inference time…
It only currently changes the max batch size that the TensorRT engine can support. It doesn’t actually do multi-image batching, as that would require additional pre/post-processing code and changes to the input streaming. I would recommend DeepStream for applications using multi-stream batching.
Hi @dusty_nv, I’m back on this after few months! Do you have any plan to implement multi-image batching as a module of anything like this? (ideally passing an array of images in detectnet would be perfect)
I don’t currently have plans to implement batching in jetson-inference, as the primary use-case is for single-stream applications and demos/examples. I would recommend DeepStream or the TRT samples you found for batching.
There an XXX_pgie_config.txt file is used in order to configure the engine. This works fine for models like resnet10.caffemodel and resnet34_peoplenet_pruned.etlt. I was now trying to figure a similar configuration for the ssd-mobilenet.onnx to no avail currently. I know, my configuration must be incomplete ATM, but I need a hint, what to add or remove or change.
This quoted config also has a custom bbox detector. I don’t know, what to do with this, having dropped that for now.
The entire thing starts promising, but soon it finishes with an error:
Using winsys: x11
0:00:00.499388128 12676 0x7c98440 INFO nvinfer gstnvinfer.cpp:619:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1716> [UID = 1]: Trying to create engine from model files
----------------------------------------------------------------
Input filename: /home/ubuntu/dragonfly-safety/jetson-inference/models/primary-detector-nano/ssd-mobilenet.onnx
ONNX IR version: 0.0.6
Opset version: 9
Producer name: pytorch
Producer version: 1.6
Domain:
Model version: 0
Doc string:
----------------------------------------------------------------
WARNING: [TRT]: onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
WARNING: [TRT]: onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
INFO: [TRT]: Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
ERROR: [TRT]: ../rtSafe/cuda/cutensorReformat.cpp (227) - Assertion Error in executeCutensor: 0 (validateInputsCutensor(src, dst))
ERROR: Build engine failed from config file
ERROR: failed to build trt engine.
0:01:59.530931154 12676 0x7c98440 ERROR nvinfer gstnvinfer.cpp:613:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1736> [UID = 1]: build engine file failed
0:01:59.534033449 12676 0x7c98440 ERROR nvinfer gstnvinfer.cpp:613:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:1822> [UID = 1]: build backend context failed
0:01:59.534300953 12676 0x7c98440 ERROR nvinfer gstnvinfer.cpp:613:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::initialize() <nvdsinfer_context_impl.cpp:1149> [UID = 1]: generate backend failed, check config file settings
0:01:59.545112136 12676 0x7c98440 WARN nvinfer gstnvinfer.cpp:812:gst_nvinfer_start:<primary-inference> error: Failed to create NvDsInferContext instance
0:01:59.545162033 12676 0x7c98440 WARN nvinfer gstnvinfer.cpp:812:gst_nvinfer_start:<primary-inference> error: Config file path: /tmp/tmpcm6h3n8z, NvDsInfer Error: NVDSINFER_CONFIG_FAILED
Error: gst-resource-error-quark: Failed to create NvDsInferContext instance (1): /dvs/git/dirty/git-master_linux/deepstream/sdk/src/gst-plugins/gst-nvinfer/gstnvinfer.cpp(812): gst_nvinfer_start (): /GstPipeline:pipeline0/GstNvInfer:primary-inference:
Config file path: /tmp/tmpcm6h3n8z, NvDsInfer Error: NVDSINFER_CONFIG_FAILED
Would you by chance have a pointer, what might go wrong here?
OK, one step ahead. For some reasons batch-size=3 seems to be a problem. I’m using this, because I’m having 3 input cameras. To move forward I changed it to batch-size=1 indeed, the engine file was created.
ssd-mobilenet.onnx_b1_gpu0_fp16.engine
Not sure, why it doesn’t work with 3.
It crashed later, complaining to be unable to parse bboxes, which is for sure caused by the fact, that I ignored the extra handler:
Using winsys: x11
0:00:01.226569907 15513 0xd643640 INFO nvinfer gstnvinfer.cpp:619:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1716> [UID = 1]: Trying to create engine from model files
----------------------------------------------------------------
Input filename: /home/ubuntu/dragonfly-safety/jetson-inference/models/primary-detector-nano/ssd-mobilenet.onnx
ONNX IR version: 0.0.6
Opset version: 9
Producer name: pytorch
Producer version: 1.6
Domain:
Model version: 0
Doc string:
----------------------------------------------------------------
WARNING: [TRT]: onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
WARNING: [TRT]: onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
INFO: [TRT]: Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
INFO: [TRT]: Detected 1 inputs and 4 output network tensors.
0:02:58.653771316 15513 0xd643640 INFO nvinfer gstnvinfer.cpp:619:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1749> [UID = 1]: serialize cuda engine to file: /home/ubuntu/dragonfly-safety/jetson-inference/models/primary-detector-nano/ssd-mobilenet.onnx_b1_gpu0_fp16.engine successfully
INFO: [Implicit Engine Info]: layers num: 3
0 INPUT kFLOAT input_0 3x300x300
1 OUTPUT kFLOAT scores 3000x9
2 OUTPUT kFLOAT boxes 3000x4
ERROR: [TRT]: INVALID_ARGUMENT: Cannot find binding of given name: grid
0:02:58.674788473 15513 0xd643640 WARN nvinfer gstnvinfer.cpp:616:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::checkBackendParams() <nvdsinfer_context_impl.cpp:1670> [UID = 1]: Could not find output layer 'grid' in engine
0:02:58.772456952 15513 0xd643640 INFO nvinfer gstnvinfer_impl.cpp:313:notifyLoadModelStatus:<primary-inference> [UID 1]: Load new model:/tmp/tmpr93o712f sucessfully
0:02:59.703485356 15513 0xd1dcf20 ERROR nvinfer gstnvinfer.cpp:613:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::parseBoundingBox() <nvdsinfer_context_impl_output_parsing.cpp:59> [UID = 1]: Could not find output coverage layer for parsing objects
0:02:59.704273968 15513 0xd1dcf20 ERROR nvinfer gstnvinfer.cpp:613:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::fillDetectionOutput() <nvdsinfer_context_impl_output_parsing.cpp:733> [UID = 1]: Failed to parse bboxes
Segmentation fault (core dumped)
What am I supposed to do with these errors?
Could not find output layer 'grid' in engine
Could not find output coverage layer for parsing objects
Failed to parse bboxes
I figured out that the parameter output-blob-names would probably have to be output-blob-names=boxes. Then just the problem with the “Failed to parse bboxes” remains, which probably really needs that custom parser…
EDIT: Maybe a better fit would be
output-blob-names=boxes;scores
EDIT 2: Not sure, if I now have moved forward or back again. I found a bbox parser in /opt/nvidia/deepstream/deepstream-5.1/sources/objectDetector_SSD/nvdsinfer_custom_impl_ssd and patched that a bit (namely the CUDA version in the Makefile and the number of classes in nvdsparsebbox_ssd.cpp. I built it, copied the lib to my working directory and adapted the config. But I’m still having the “Failed to parse bbox crash”.
My current config (the lib is loaded, I’m pretty sure):
Hi @foreverneilyoung, it seems you have another thread going on the DeepStream forum about this, which is good because they know a lot more about DeepStream than I over there :)
You may have found this, but this is where the outputs of the ONNX-based SSD-Mobilenet are interpreted:
boxes layer is a buffer of float4’s (left, top, right, bottom) with coordinates between [0,1] - so they need multiplied by the image width/height
scores layer is a buffer of floats, num_boxes * num_classes long. Each box has a confidence value for each class - the maximum confidence value is the class for that box. The confidences should be thresholded, because not every box is actually a detection.
It looks like your DeepStream custom bounding box parser would need modified to reflect the same parsing as above. Currently I think it is setup for the ‘TensorFlow way’, which is where the score + bounding box coordinates are all output in the same layer. You can see in my code where I actually have that way implemented too, in order to run the TensorFlow UFF version of the models.
Hi @foreverneilyoung, it seems you have another thread going on the DeepStream forum about this, which is good because they know a lot more about DeepStream than I over there :)
I completely agree. Today was a crash course…:)
You may have found this, but this is where the outputs of the ONNX-based SSD-Mobilenet are interpreted:
No not yet. I was looking for that, but was lost…
Cool. That should give me a new kick start. Thanks. I agree with your conclusion. I have checked a lot of the DeepStream samples; none of them is a perfect match.
It looks like your DeepStream custom bounding box parser would need modified to reflect the same parsing as above. Currently I think it is setup for the ‘TensorFlow way’, which is where the score + bounding box coordinates are all output in the same layer. You can see in my code where I actually have that way implemented too, in order to run the TensorFlow UFF version of the models.