After converting ssdMobilnet from the examples, the model is slower

Hi.

I recently started using NX. I need to convert models from Tensorflow to trt. I use tftrt for this. And all models (frcnn, rcnn) after conversion work slower or at the same speed.
I tried to convert the model from https://github.com/NVIDIA-AI-IOT/tf_trt_models ssdMobilnet, doing everything according to the guide, but the inference speed dropped by half.

Jetpack: 4.4
L4T: 32.4.3
Ubuntu: 18.04
Cuda: 10.2.89
cuDNN: 8.0.0.180
Tensorflow: 1.15.2
Jetson clocls: ON
NVP model: 15W 6CORE

Tell me, please, what can I try to do to solve the problem.

Hi,

Could you share the log output from the TensorFlow with us first?

Please noticed that tf-trt only convert the operations into TensorRT when the mapping is founded.
If the ration of TensorRT over TensorFlow operation is low, the performance will be similar to the pure TensorFlow case.

Thanks.

The numbers of trt layers is really small.

020-08-05 20:25:39.567795: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2 WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them. 2020-08-05 20:25:49.209373: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libnvinfer.so.7 2020-08-05 20:25:49.253959: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libnvinfer_plugin.so.7 WARNING:tensorflow:From convert.py:8: The name tf.GraphDef is deprecated. Please use tf.compat.v1.GraphDef instead.

======================= numbs trt_engine_ops in graph 0
=======================numb. of all_ops in graph: 5960
2020-08-05 20:25:52.069570: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libnvinfer.so.7
2020-08-05 20:26:05.861733: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-08-05 20:26:05.907964: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] ARM64 does not support NUMA - returning NUMA node zero
2020-08-05 20:26:05.908282: I tensorflow/core/grappler/devices.cc:55] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2020-08-05 20:26:05.908733: I tensorflow/core/grappler/clusters/single_machine.cc:356] Starting new session
2020-08-05 20:26:05.932975: W tensorflow/core/platform/profile_utils/cpu_utils.cc:98] Failed to find bogomips in /proc/cpuinfo; cannot determine CPU frequency
2020-08-05 20:26:05.933901: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1d8baa10 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-08-05 20:26:05.933978: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-08-05 20:26:06.125510: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] ARM64 does not support NUMA - returning NUMA node zero
2020-08-05 20:26:06.126004: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1ddb2230 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-08-05 20:26:06.126079: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Xavier, Compute Capability 7.2
2020-08-05 20:26:06.126595: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] ARM64 does not support NUMA - returning NUMA node zero
2020-08-05 20:26:06.126789: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1634] Found device 0 with properties:
name: Xavier major: 7 minor: 2 memoryClockRate(GHz): 1.109
pciBusID: 0000:00:00.0
2020-08-05 20:26:06.126934: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-08-05 20:26:06.127126: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-08-05 20:26:06.210369: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-08-05 20:26:06.330378: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-08-05 20:26:06.475302: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-08-05 20:26:06.555832: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-08-05 20:26:06.556133: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-08-05 20:26:06.556470: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] ARM64 does not support NUMA - returning NUMA node zero
2020-08-05 20:26:06.556810: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] ARM64 does not support NUMA - returning NUMA node zero
2020-08-05 20:26:06.556915: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1762] Adding visible gpu devices: 0
2020-08-05 20:26:06.557088: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-08-05 20:26:18.070744: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1175] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-08-05 20:26:18.070942: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] 0
2020-08-05 20:26:18.070992: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1194] 0: N
2020-08-05 20:26:18.071574: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] ARM64 does not support NUMA - returning NUMA node zero
2020-08-05 20:26:18.071962: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] ARM64 does not support NUMA - returning NUMA node zero
2020-08-05 20:26:18.072254: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1320] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 544 MB memory) -> physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2)
2020-08-05 20:26:29.042393: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:486] There are 3652 ops of 49 different types in the graph that are not converted to TensorRT: TopKV2, NonMaxSuppressionV2, Split, Gather, Where, Size, Greater, Equal, Fill, Transpose, TensorArrayWriteV3, Exit, NoOp, Pack, LoopCond, Merge, Switch, TensorArraySizeV3, TensorArrayV3, Placeholder, TensorArrayScatterV3, Reshape, Cast, Const, Sub, Maximum, StridedSlice, Shape, Minimum, Assert, TensorArrayReadV3, Identity, ExpandDims, ResizeBilinear, Enter, Squeeze, Add, NextIteration, TensorArrayGatherV3, DataFormatVecPermute, Less, Range, ZerosLike, Slice, Mul, RealDiv, Tile, Unpack, ConcatV2, (For more information see https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#supported-ops).
2020-08-05 20:26:29.247800: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:647] Number of TensorRT candidate segments: 5
2020-08-05 20:26:29.500017: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:748] TensorRT node TRTEngineOp_0 added for segment 0 consisting of 275 nodes succeeded.
2020-08-05 20:26:29.503128: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:748] TensorRT node TRTEngineOp_1 added for segment 1 consisting of 18 nodes succeeded.
2020-08-05 20:26:29.503549: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:748] TensorRT node TRTEngineOp_2 added for segment 2 consisting of 18 nodes succeeded.
2020-08-05 20:26:29.503967: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:748] TensorRT node TRTEngineOp_3 added for segment 3 consisting of 7 nodes succeeded.
2020-08-05 20:26:29.504265: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:748] TensorRT node TRTEngineOp_4 added for segment 4 consisting of 3 nodes succeeded.
2020-08-05 20:26:30.820642: W tensorflow/compiler/tf2tensorrt/convert/trt_optimization_pass.cc:183] TensorRTOptimizer is probably called on funcdef! This optimizer must NOT be called on function objects.
2020-08-05 20:26:30.937670: W tensorflow/compiler/tf2tensorrt/convert/trt_optimization_pass.cc:183] TensorRTOptimizer is probably called on funcdef! This optimizer must NOT be called on function objects.
2020-08-05 20:26:30.952500: W tensorflow/compiler/tf2tensorrt/convert/trt_optimization_pass.cc:183] TensorRTOptimizer is probably called on funcdef! This optimizer must NOT be called on function objects.
2020-08-05 20:26:30.967235: W tensorflow/compiler/tf2tensorrt/convert/trt_optimization_pass.cc:183] TensorRTOptimizer is probably called on funcdef! This optimizer must NOT be called on function objects.
2020-08-05 20:26:30.982791: W tensorflow/compiler/tf2tensorrt/convert/trt_optimization_pass.cc:183] TensorRTOptimizer is probably called on funcdef! This optimizer must NOT be called on function objects.
2020-08-05 20:26:31.022218: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:822] Optimization results for grappler item: tf_graph
2020-08-05 20:26:31.022418: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:824] constant_folding: Graph size after: 5961 nodes (1), 10025 edges (2), time = 1564.90198ms.
2020-08-05 20:26:31.022504: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:824] layout: Graph size after: 6089 nodes (128), 10153 edges (128), time = 1118.76294ms.
2020-08-05 20:26:31.022542: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:824] constant_folding: Graph size after: 5993 nodes (-96), 10057 edges (-96), time = 1056.1ms.
2020-08-05 20:26:31.022659: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:824] TensorRTOptimizer: Graph size after: 5677 nodes (-316), 9732 edges (-325), time = 1064.46ms.
2020-08-05 20:26:31.022716: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:824] constant_folding: Graph size after: 5676 nodes (-1), 9732 edges (0), time = 856.888ms.
2020-08-05 20:26:31.022763: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:822] Optimization results for grappler item: TRTEngineOp_0_native_segment
2020-08-05 20:26:31.022799: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:824] constant_folding: Graph size after: 294 nodes (0), 293 edges (0), time = 63.057ms.
2020-08-05 20:26:31.022834: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:824] layout: Graph size after: 294 nodes (0), 293 edges (0), time = 74.041ms.
2020-08-05 20:26:31.022867: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:824] constant_folding: Graph size after: 294 nodes (0), 293 edges (0), time = 66.681ms.
2020-08-05 20:26:31.022899: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:824] TensorRTOptimizer: Graph size after: 294 nodes (0), 293 edges (0), time = 9.538ms.
2020-08-05 20:26:31.022932: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:824] constant_folding: Graph size after: 294 nodes (0), 293 edges (0), time = 68.235ms.
2020-08-05 20:26:31.022968: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:822] Optimization results for grappler item: TRTEngineOp_2_native_segment
2020-08-05 20:26:31.023010: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:824] constant_folding: Graph size after: 24 nodes (0), 27 edges (0), time = 2.701ms.
2020-08-05 20:26:31.023050: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:824] layout: Graph size after: 24 nodes (0), 27 edges (0), time = 2.043ms.
2020-08-05 20:26:31.023083: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:824] constant_folding: Graph size after: 24 nodes (0), 27 edges (0), time = 2.279ms.
2020-08-05 20:26:31.023115: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:824] TensorRTOptimizer: Graph size after: 24 nodes (0), 27 edges (0), time = 0.255ms.
2020-08-05 20:26:31.023182: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:824] constant_folding: Graph size after: 24 nodes (0), 27 edges (0), time = 2.281ms.
2020-08-05 20:26:31.023243: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:822] Optimization results for grappler item: TRTEngineOp_4_native_segment
2020-08-05 20:26:31.023303: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:824] constant_folding: Graph size after: 10 nodes (0), 9 edges (0), time = 1.545ms.
2020-08-05 20:26:31.023380: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:824] layout: Graph size after: 10 nodes (0), 9 edges (0), time = 1.333ms.
2020-08-05 20:26:31.023418: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:824] constant_folding: Graph size after: 10 nodes (0), 9 edges (0), time = 1.541ms.
2020-08-05 20:26:31.023451: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:824] TensorRTOptimizer: Graph size after: 10 nodes (0), 9 edges (0), time = 0.204ms.
2020-08-05 20:26:31.023484: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:824] constant_folding: Graph size after: 10 nodes (0), 9 edges (0), time = 1.846ms.
2020-08-05 20:26:31.023517: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:822] Optimization results for grappler item: TRTEngineOp_3_native_segment
2020-08-05 20:26:31.023565: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:824] constant_folding: Graph size after: 14 nodes (0), 13 edges (0), time = 2.295ms.
2020-08-05 20:26:31.023637: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:824] layout: Graph size after: 14 nodes (0), 13 edges (0), time = 1.499ms.
2020-08-05 20:26:31.023672: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:824] constant_folding: Graph size after: 14 nodes (0), 13 edges (0), time = 1.913ms.
2020-08-05 20:26:31.023704: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:824] TensorRTOptimizer: Graph size after: 14 nodes (0), 13 edges (0), time = 0.208ms.
2020-08-05 20:26:31.023736: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:824] constant_folding: Graph size after: 14 nodes (0), 13 edges (0), time = 1.923ms.
2020-08-05 20:26:31.023768: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:822] Optimization results for grappler item: TRTEngineOp_1_native_segment
2020-08-05 20:26:31.023807: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:824] constant_folding: Graph size after: 24 nodes (0), 27 edges (0), time = 2.8ms.
2020-08-05 20:26:31.023848: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:824] layout: Graph size after: 24 nodes (0), 27 edges (0), time = 1.919ms.
2020-08-05 20:26:31.023880: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:824] constant_folding: Graph size after: 24 nodes (0), 27 edges (0), time = 2.356ms.
2020-08-05 20:26:31.023911: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:824] TensorRTOptimizer: Graph size after: 24 nodes (0), 27 edges (0), time = 0.271ms.
2020-08-05 20:26:31.023942: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:824] constant_folding: Graph size after: 24 nodes (0), 27 edges (0), time = 2.254ms.
======================= numbs trt_engine_ops in graph 5
=======================numb. of all_ops in graph: 5676

But on http://gitgub.com/NVIDIA-AI-IOT/tf_trt_models authors write
|Model|Input Size|TF-TRT TX2|TF TX2|
|ssd_mobilenet_v1_coco|300x300|50.5ms|72.9ms|. That’s strange.

Hi,

This should depends on the layers used in your model.
Do you also use ssd_mobilenet_v1 as the authors?

If yes, it’s recommended to convert the model into pure TensorRT engine for performance.

Thanks.