TRT engine - peculiar behaviour

Hi, I am building a TRT engine for a custom SSD-mobilenet-v1 model trained in Caffe (using TRT python API). The script i am using builds the engine, serialize&saves it on a *.bin/.engine file and then immediately after build, performs an inference on an image saved on nano…

While the engine is being built and executed successfully (returning the correct bbox), when, in a separate run, i skip the build part and just load and deserialize the engine from the .bin/.engine file created earlier, i get weird results, with more than one bboxes in fixed positions each (no matter the input image)… and this is also happening with another custom caffe-squeezenetSSD (!)… i have also used the same script (with appropriate modifications) for building and running an custom caffe-Resnet50 model and all work as expected (on-line build and run and load-run)…

I do not get why the SSD models are running fine when in build-and-run mode and not when are loaded and the executed…

Hi,

A possible cause is that some parameter in deserialization mode doesn’t be initialized correctly.
Would you mind to share your source so we can check it further?

Here is a good example for deserializing a TenosrRT model:

Thanks.

Hi thanks for the response…
The scipt is rather basic folowing the NVIDIA API guidlines, sopython_test_TRT_CaffeSSD.txt (7.0 KB)

Do I need to also load some plugin facory in python ? I saw the extra “pluginFactory” argument from your post above…
nvinfer1::ICudaEngine* engine = infer->deserializeCudaEngine(engine_stream, engine_size, pluginFactory);
and here
Runtime — tensorrt 7.2.0.9 documentation (nvidia.com)

How can i do this ? Is this only for PriorBox plugin ?
BTW, I am using the Jetpac 4.4 [L4T 32.4.3] on nano with TRT ver 7.1.3.0 with CUDA v10.0.89 and OpenCV v3.4.8
.

Hi,

pluginFactory is only required when the model use a customized layer for inference.
Based on your source, the model should work fine without setting it.

Would you mind to check the TensorRT engine is well constructed or not first?

def load_engine(engine_file):
    with open(engine_file, 'rb') as f, trt.Runtime(TRT_LOGGER) as runtime:
        engine =  runtime.deserialize_cuda_engine(f.read())
        if engine is None: print("deserialize fails")
        return engine

If the engine is good, please share the the Caffe model with us so we can check it deeper.

Thanks.

Thanks for the response! I can confirm that there is no error when the engine is loaded and deserialized…

The same weird behaviour is also present when a build/load/run the engine with the deafult mobilenetSSD network from [https://github.com/chuanqi305/MobileNet-SSD), with default caffemodel and slightly altered prototxt ( a. flatten layers are changed to Reshape layers including their params and b. “keep_count” output has been added right after the “detection_out”). Again, when i build the engine and immidiately execute it everything run smoothly, but when i serialize/save the engine and then load it its getting really messy…

Here you can find a zip file containing:

  • the python script i use,
  • the .prototxt and the .caffemodel
  • the test image and the two images with detection results (when i build/run the engine and when i load/run the engine)
  • ther engine file (.bin) i build

thanx and I hope we can figure it out !

Thanks for the data.

Will update later.

1 Like

Hi,

Thanks for sharing the detail source to reproduce this.

We confirmed that the same issue also occurs in our environment.
This problem is passed to our internal team now.

Will keep you updated once we got a feedback.
Thanks.

1 Like

Hi @AastaLLL,
Is there any update regarding the issue ?

B.R.

Hi,

We confirmed that there are some issue in our serializer and deserializer.
But the detail root cause is still under checking.

Thanks.

1 Like

Hi,

This issue is fixed in our internal branch.
The fix will be available in our future release.

Thanks.

1 Like

thank you for your support!