Hi,
Confirmed that we can convert the ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8 model into TensorRT on Jetson.
Below is the detailed steps for your reference:
1. Environment
$ sudo docker run -it --rm --runtime nvidia --network host nvcr.io/nvidia/l4t-tensorflow:r32.6.1-tf2.5-py3
2. Install Prerequisites
$ apt-get update
$ apt-get install cmake g++ git libprotobuf-dev protobuf-compiler
$ pip3 install onnx tf2onnx pillow
$ git clone https://github.com/NVIDIA/TensorRT.git
$ cd TensorRT/tools/onnx-graphsurgeon/
$ make install
$ python3 -m pip install --force-reinstall dist/*.whl
$ wget https://nvidia.box.com/shared/static/jy7nqva7l88mq9i8bw3g3sklzf4kccn2.whl -O onnxruntime_gpu-1.10.0-cp36-cp36m-linux_aarch64.whl
$ pip3 install onnxruntime_gpu-1.10.0-cp36-cp36m-linux_aarch64.whl
3. Prepare Source
$ cd ../../samples/python/tensorflow_object_detection_api/
$ git clone https://github.com/tensorflow/models.git
$ cp -r models/research/object_detection .
$ protoc object_detection/protos/*.proto --python_out=.
Apply below change:
diff --git a/samples/python/tensorflow_object_detection_api/create_onnx.py b/samples/python/tensorflow_object_detection_api/create_onnx.py
index b6ac423..7292756 100644
--- a/samples/python/tensorflow_object_detection_api/create_onnx.py
+++ b/samples/python/tensorflow_object_detection_api/create_onnx.py
@@ -254,8 +254,8 @@ class TFODGraphSurgeon:
concat_node.outputs = []
# Disconnect the last node in second preprocessing branch with parent second TensorListStack node.
- tile_node = self.graph.find_node_by_op("Tile")
- tile_node.outputs = []
+ #tile_node = self.graph.find_node_by_op("Tile")
+ #tile_node.outputs = []
# Reshape nodes tend to update the batch dimension to a fixed value of 1, they should use the batch size instead.
for node in [node for node in self.graph.nodes if node.op == "Reshape"]:
4. Convert Model
$ wget http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8.tar.gz .
$ tar -xvf ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8.tar.gz
$ python3 create_onnx.py --pipeline_config ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8/pipeline.config --saved_model ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8/saved_model --onnx model.onnx --input_format NCHW
$ python3 build_engine.py --onnx model.onnx --engine engine.trt --precision fp16
5. Test with TensorRT
$ /usr/src/tensorrt/bin/trtexec --loadEngine=engine.trt
...
[12/29/2021-08:40:07] [I] === Performance summary ===
[12/29/2021-08:40:07] [I] Throughput: 146.935 qps
[12/29/2021-08:40:07] [I] Latency: min = 6.73804 ms, max = 8.90845 ms, mean = 6.79419 ms, median = 6.77783 ms, percentile(99%) = 7.34839 ms
[12/29/2021-08:40:07] [I] End-to-End Host Latency: min = 6.7522 ms, max = 8.92114 ms, mean = 6.80563 ms, median = 6.78809 ms, percentile(99%) = 7.36255 ms
[12/29/2021-08:40:07] [I] Enqueue Time: min = 4.5791 ms, max = 8.78735 ms, mean = 5.32989 ms, median = 5.07129 ms, percentile(99%) = 6.9624 ms
[12/29/2021-08:40:07] [I] H2D Latency: min = 0.0742188 ms, max = 0.0761719 ms, mean = 0.0748875 ms, median = 0.0749512 ms, percentile(99%) = 0.0759277 ms
[12/29/2021-08:40:07] [I] GPU Compute Time: min = 6.65723 ms, max = 8.8291 ms, mean = 6.71306 ms, median = 6.69604 ms, percentile(99%) = 7.26758 ms
[12/29/2021-08:40:07] [I] D2H Latency: min = 0.00512695 ms, max = 0.00732422 ms, mean = 0.00624302 ms, median = 0.00634766 ms, percentile(99%) = 0.00708008 ms
[12/29/2021-08:40:07] [I] Total Host Walltime: 1.09572 s
[12/29/2021-08:40:07] [I] Total GPU Compute Time: 1.0808 s
[12/29/2021-08:40:07] [I] Explanations of the performance metrics are printed in the verbose logs.
[12/29/2021-08:40:07] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8001] # /usr/src/tensorrt/bin/trtexec --loadEngine=engine.trt
[12/29/2021-08:40:07] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1377, GPU 16207 (MiB)
Thanks.