Tensorrt fails for custom ssd_inception Model

Linux version : Ubuntu 16.04 LTS
GPU type : GeForce GTX 1080
nvidia driver version : 410.72
CUDA version : 9.0
CUDNN version : 7.0.5
Python version [if using python] : 3.5.2
Tensorflow version : tensorflow-gpu 1.9
TensorRT version : 5.0.2.6

Actual Problem,

I tried the example script under samples/python/uff_ssd folder. The Script downloads SSD_inception model, creates uff parser, builds engine and performs inference on Image.

Now, instead of downloading a pre-trained model, I trained my own object_detection model using SSD_inception as architecture. But am getting following errors:
[TensorRT] ERROR: Parameter check failed at: …/builder/Layers.h::setAxis::315, condition: axis>=0
[TensorRT] ERROR: Concatenate/concat: all concat input tensors must have the same dimensions except on the concatenation axis
[TensorRT] ERROR: UFFParser: Parser error: BoxPredictor_0/ClassPredictor/BiasAdd: The input to the Scale Layer is required to have a minimum of 3 dimensions.

I am using the same ssd_inception architecture, but still getting this error. Could anyone help me with this issue?

If you’re able to send the model I can take a look at it and see if we can work it out.

Hi KevinSchlichter

Thanks for looking into the issue, I have attached the link to pb file

https://drive.google.com/open?id=1TwgzXmv9OZ_yKG98BTT-cXNo8L0B2Ymt

Can you give me a step-by-step of how you’re using the model? I’m getting different errors, so I’m doing something differently.

Initially I am running the uff_ssd/detect_objects.py . This:
1)Creates a workspace folder, downloads and extracts the pretrained SSD_INCEPTION model (ssd_inception_v2_coco_2017_11_17)
2) Converts the frozen_inference_graph.pb to frozen_inference_graph.uff.
3) Builds tensorrt engine using the uff.
4) Performs inference using the tensorrt engine, got image_inferred,jpg as output.

My modifactions:

  1. I removed the tensorrt engine file that got created in workspace folder.
  2. I removed the frozen_inference_graph.pb and frozen_inference_graph.uff from models/ssd_inception_v2_coco_2017_11_17 folder inside workspace.
  3. Added my custom frozen_inference_graph.pb to workspace/models/ssd_inception_v2_coco_2017_11_17
  4. Modified coco.py inside uff_ssd/utils according to my no.of classes.
  5. Modified model.py inside uff_ssd/utils like following:
    —> line 91 numClasses=91 to numClasses=6
    —> line 238 commented the download part #download_model(model_name, silent)
    —> line 239 ssd_pb_path = PATHS.get_model_pb_path(model_name) to ssd_pb_path = <‘path_to_custom.pb file present in uff_ssd/worspace/models/ssd_inception_v2_coco_2017_11_17’>

Also, can you tell what errors that you are getting? Were you able to create the tensorrt engine and run inference. Can you share your procedure to execute?

Hi KevinSchlichter

If possible could you share the errors you’re getting please ?

I’m not sure what I was doing last week, but it’s working for me now.

#Started a container
nvidia-docker run -v /home/nvesk/:/workspace/nvesk -ti --rm nvcr.io/nvidia/tensorrt:18.12-py3

history
1 /opt/tensorrt/python/python_setup.sh
2 cd tensorrt/samples/python/uff_ssd/
3 cat README.md
4 mkdir build
5 cd build/
6 cmake …
7 make
8 cd …
#I already downloaded this from last week, so I’m skipping the wget
9 cp /workspace/nvesk/VOCtest_06-Nov-2007.tar .
10 tar xf VOCtest_06-Nov-2007.tar
11 python detect_objects.py images/image2.jpg
12 rm workspace/models/ssd_inception_v2_coco_2017_11_17/frozen_inference_graph.pb workspace/models/ssd_inception_v2_coco_2017_11_17/frozen_inference_graph.pbtxt workspace/models/ssd_inception_v2_coco_2017_11_17/frozen_inference_graph.uff
13 cp /workspace/nvesk/frozen_inference_graph_custom.pb workspace/models/ssd_inception_v2_coco_2017_11_17/frozen_inference_graph.pb
14 python detect_objects.py images/image2.jpg
15 vi utils/model.py

I only commented out line 238: #download_model(model_name, silent)

16 python detect_objects.py images/image2.jpg
output:

WARNING:tensorflow:From /usr/lib/python3.5/dist-packages/graphsurgeon/StaticGraph.py:123: FastGFile.__init__ (from tensorflow.python.platform.gfile) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.gfile.GFile.
WARNING: To create TensorRT plugin nodes, please use the `create_plugin_node` function instead.
UFF Version 0.5.5
=== Automatically deduced input nodes ===
[name: "Input"
op: "Placeholder"
attr {
  key: "dtype"
  value {
    type: DT_FLOAT
  }
}
attr {
  key: "shape"
  value {
    shape {
      dim {
        size: 1
      }
      dim {
        size: 3
      }
      dim {
        size: 300
      }
      dim {
        size: 300
      }
    }
  }
}
]
=========================================

Using output node NMS
Converting to UFF graph
Warning: No conversion function registered for layer: NMS_TRT yet.
Converting NMS as custom op: NMS_TRT
Warning: No conversion function registered for layer: FlattenConcat_TRT yet.
Converting concat_box_conf as custom op: FlattenConcat_TRT
Warning: No conversion function registered for layer: GridAnchor_TRT yet.
Converting GridAnchor as custom op: GridAnchor_TRT
Warning: No conversion function registered for layer: FlattenConcat_TRT yet.
Converting concat_box_loc as custom op: FlattenConcat_TRT
No. nodes: 781
UFF Output written to /workspace/tensorrt/samples/python/uff_ssd/utils/../workspace/models/ssd_inception_v2_coco_2017_11_17/frozen_inference_graph.uff
UFF Text Output written to /workspace/tensorrt/samples/python/uff_ssd/utils/../workspace/models/ssd_inception_v2_coco_2017_11_17/frozen_inference_graph.pbtxt
TensorRT inference engine settings:
  * Inference precision - DataType.FLOAT
  * Max batch size - 1

Loading cached TensorRT engine from /workspace/tensorrt/samples/python/uff_ssd/utils/../workspace/engines/FLOAT/engine_bs_1.buf
TensorRT inference time: 6 ms
Detected kite with confidence 97%
Detected person with confidence 91%
Detected kite with confidence 89%
Detected person with confidence 89%
Detected kite with confidence 83%
Detected kite with confidence 82%
Detected person with confidence 76%
Detected kite with confidence 74%
Detected person with confidence 70%
Detected person with confidence 62%
Detected person with confidence 59%
Total time taken for one image: 188 ms

Saved output image to: /workspace/tensorrt/samples/python/uff_ssd/utils/../image_inferred.jpg

17 ls workspace/models/ssd_inception_v2_coco_2017_11_17/
18 mv image_inferred.jpg /workspace/nvesk/image2.jpg
#This time the inference is much faster, since it isn’t converting the uff
19 python detect_objects.py images/image1.jpg
output:

TensorRT inference engine settings:
  * Inference precision - DataType.FLOAT
  * Max batch size - 1

Loading cached TensorRT engine from /workspace/tensorrt/samples/python/uff_ssd/utils/../workspace/engines/FLOAT/engine_bs_1.buf
TensorRT inference time: 6 ms
Detected dog with confidence 98%
Detected dog with confidence 93%
Detected person with confidence 75%
Total time taken for one image: 69 ms

Saved output image to: /workspace/tensorrt/samples/python/uff_ssd/utils/../image_inferred.jpg

20 mv image_inferred.jpg /workspace/nvesk/image1.jpg

Hi

Thank you very much for the help, we appreciate it a lot.

However if you observe in code segment line

54.Loading cached TensorRT engine from /workspace/tensorrt/samples/python/uff_ssd/utils/../workspace/engines/FLOAT/engine_bs_1.buf

it is still loading the engine created for pre-trained model, using the cached engine to run inference.

When we checked the same code for custom trained pb file after deleting the engine (/workspace/engines folder that is created for pre-trained model).
The code should take the custom.pb as input, convert to uff, build the engine(rather than using cached engine) and then perform inference.
But it is not able to create the TensorRT engine for custom.pb file
we are stuck at that part and getting the same error quoted below

[TensorRT] ERROR: Parameter check failed at: ../builder/Layers.h::setAxis::315, condition: axis>=0
[TensorRT] ERROR: Concatenate/concat: all concat input tensors must have the same dimensions except on the concatenation axis
[TensorRT] ERROR: UFFParser: Parser error: BoxPredictor_0/ClassPredictor/BiasAdd: The input to the Scale Layer is required to have a minimum of 3 dimensions.
Building TensorRT engine. This may take few minutes.
[TensorRT] ERROR: Network must have at least one output
Traceback (most recent call last):
  File "detect_objects.py", line 193, in <module>
    main()
  File "detect_objects.py", line 166, in main
    batch_size=parsed['max_batch_size'])
  File "/workspace/teai/TensorRT/TensorRT-5.0.2.6/targets/x86_64-linux-gnu/samples/python/uff_ssd/utils/inference.py", line 69, in __init__
    engine_utils.save_engine(self.trt_engine, trt_engine_path)
  File "/workspace/teai/TensorRT/TensorRT-5.0.2.6/targets/x86_64-linux-gnu/samples/python/uff_ssd/utils/engine.py", line 83, in save_engine
    buf = engine.serialize()
AttributeError: 'NoneType' object has no attribute 'serialize'

Could you try the inference on custom.pb after deleting the ‘workspace/engines’ folder ?

Hi team,

We are trying to optimize the custom graph(ssd_inception) using tensorrt 5.0 . Did you get a chance to go through the above mentioned error.
We are stuck at this point.

I’m seeing the same errors now. The converter is trying to handle layers it doesn’t really know what to do with. That’s the series of warnings beginning with “WARNING: To create TensorRT plugin nodes, please use the create_plugin_node function instead.” The parameter check errors are probably a result of that. Try converting each of those layers to a customer layer as a first step. That should clear those warnings.

https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#extending

Hi NVES_K,

Thanks for the valuable suggestion, will work on the same. I am just curious that TensorRT was working fine for the pre-trained SSD inception model, we didn’t made any alterations in layers, just used a different dataset(no.of output classes=6).
Instead of initialising weights randomly, we initialised with pre-trained checkpoint weights. I remember no other layer changes were made.
So if it could work well for pre-trained model, it should do fine on custom-trained model also.
We will further explore on the custom layer implementation in TensorRT. Meanwhile if you could find any help/solution do let us know.
Thanks in advance.

Linux version : Ubuntu 16.04 LTS
GPU type : GeForce GTX 1080
nvidia driver version : 410.93
CUDA version : 10.0
CUDNN version : 7.4.1
Python version [Anaconda] : 3.6.8
Tensorflow version : tensorflow-gpu 1.13.1
TensorRT version : 5.1.2.2

[libprotobuf FATAL /home/erisuser/p4sw/sw/gpgpu/MachineLearning/DIT/externals/protobuf/x86_64/10.0/include/google/protobuf/repeated_field.h:1408] CHECK failed: (index) < (current_size_): 
Traceback (most recent call last):
  File "detect_objects.py", line 245, in <module>
    main()
  File "detect_objects.py", line 219, in main
    batch_size=args.max_batch_size)
  File "/home/zcy/1_data_sets/TensorRT-5.1.2.2/targets/x86_64-linux-gnu/samples/python/uff_ssd/utils/inference.py", line 115, in __init__
    batch_size=batch_size)
  File "/home/zcy/1_data_sets/TensorRT-5.1.2.2/targets/x86_64-linux-gnu/samples/python/uff_ssd/utils/engine.py", line 75, in build_engine
    parser.parse(uff_model_path, network)
RuntimeError: CHECK failed: (index) < (current_size_):

I am using the uff_ssd example in tensorrt5.1.2.2.
I trained my model with ‘ssd_inception_v2_coco.config’ in tensorflow-api. When I used the script detect_objects.py in uff_ssd, the model was converted from ‘.pb’ to ‘.uff’ and also generated '.pbtxt 'File.
But when building an engine, there is always the same error. The error message is as above.
When I use the default model ‘ssd_inception_v2_coco_2017_11_17’ in the uff_ssd script ‘detect_objects.py’, everything works fine.
Any help will be appreciated!!!

Linux version : Ubuntu 16.04 LTS
GPU type : GeForce GTX 1080
nvidia driver version : 410.93
CUDA version : 10.0
CUDNN version : 7.4.1
Python version [Anaconda] : 3.6.8
Tensorflow version : tensorflow-gpu 1.13.1
TensorRT version : 5.1.2.2

[libprotobuf FATAL /home/erisuser/p4sw/sw/gpgpu/MachineLearning/DIT/externals/protobuf/x86_64/10.0/include/google/protobuf/repeated_field.h:1408] CHECK failed: (index) < (current_size_): 
Traceback (most recent call last):
  File "detect_objects.py", line 245, in <module>
    main()
  File "detect_objects.py", line 219, in main
    batch_size=args.max_batch_size)
  File "/home/zcy/1_data_sets/TensorRT-5.1.2.2/targets/x86_64-linux-gnu/samples/python/uff_ssd/utils/inference.py", line 115, in __init__
    batch_size=batch_size)
  File "/home/zcy/1_data_sets/TensorRT-5.1.2.2/targets/x86_64-linux-gnu/samples/python/uff_ssd/utils/engine.py", line 75, in build_engine
    parser.parse(uff_model_path, network)
RuntimeError: CHECK failed: (index) < (current_size_):

I am using the uff_ssd example in tensorrt5.1.2.2.
I trained my model with ‘ssd_inception_v2_coco.config’ in tensorflow-api. When I used the script detect_objects.py in uff_ssd, the model was converted from ‘.pb’ to ‘.uff’ and also generated '.pbtxt 'File.
But when building an engine, there is always the same error. The error message is as above.
When I use the default model ‘ssd_inception_v2_coco_2017_11_17’ in the uff_ssd script ‘detect_objects.py’, everything works fine.
Any help will be appreciated!!!

Hi varun365,

I am in same exact situation,
Do you have any solution this problems?

Hi NVES_K,

Do you have any solution this problems?

my problem is same?? any one can help???

me too

I have the same problem after retraining with 1 class