Hi everyone!
Our team has a problem with outcome of optimization of tensorflow object detection models with tensorRT
Setup:
Jetson Nano with system installed via SDK Manager, JetPack 4.2.2 (rev.1)
tensorflow-gpu 1.14.0+nv19.7
tensorrt 5.1.6.1
nvcc (Cuda compiler driver) release 10.0, V10.0.326
NV Power Mode: MAXN
Issue:
We trained ssd_mobilenet_v2 model from object detection model zoo on custom data using docker image with tensorflow 1.12. Trained model was converted from checkpoints format
to frozen graph with “export_inference_graph.py” from:
Frozen graph model was loaded on Jetson Nano and optimized using “create_inference_graph”. When trying to use TrtGraphConverter class, as mentioned here: from Accelerating Inference In TF-TRT User Guide :: NVIDIA Deep Learning Frameworks Documentation an error occures: Failed to import metagraph. Anyway using create_inference_graph with precision=FP32, minimum_segment_size=3, max_batch_size=1 returns different outputs each time it is invoked. The memory seems to be used.
Sometimes number of nodes after this operation is:
all nodes pre-optimization: 2671
TRT Engine opts: 12
all nodes post-optimization: 326
LOG:
2019-10-31 12:39:16.782509: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-10-31 12:39:28.397544: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
WARNING:tensorflow:From frozen2trt2.py:24: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.
WARNING:tensorflow:From frozen2trt2.py:25: The name tf.GraphDef is deprecated. Please use tf.compat.v1.GraphDef instead.
2019-10-31 12:39:41.337093: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2019-10-31 12:39:41.353294: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2019-10-31 12:39:41.353445: I tensorflow/core/grappler/devices.cc:55] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2019-10-31 12:39:41.353750: I tensorflow/core/grappler/clusters/single_machine.cc:359] Starting new session
2019-10-31 12:39:41.373688: W tensorflow/core/platform/profile_utils/cpu_utils.cc:98] Failed to find bogomips in /proc/cpuinfo; cannot determine CPU frequency
2019-10-31 12:39:41.374369: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2f4ddba0 executing computations on platform Host. Devices:
2019-10-31 12:39:41.374438: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 12:39:41.489729: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2019-10-31 12:39:41.490119: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x30abc9c0 executing computations on platform CUDA. Devices:
2019-10-31 12:39:41.490187: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): NVIDIA Tegra X1, Compute Capability 5.3
2019-10-31 12:39:41.490910: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2019-10-31 12:39:41.491070: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: NVIDIA Tegra X1 major: 5 minor: 3 memoryClockRate(GHz): 0.9216
pciBusID: 0000:00:00.0
2019-10-31 12:39:41.491164: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-10-31 12:39:41.491562: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-10-31 12:39:41.491756: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-10-31 12:39:41.491947: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-10-31 12:39:41.522212: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-10-31 12:39:41.542052: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-10-31 12:39:41.542352: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-10-31 12:39:41.542801: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2019-10-31 12:39:41.543344: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2019-10-31 12:39:41.543502: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-10-31 12:39:45.637233: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 12:39:45.637309: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0
2019-10-31 12:39:45.637333: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N
2019-10-31 12:39:45.637813: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2019-10-31 12:39:45.638164: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2019-10-31 12:39:45.638334: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 664 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
2019-10-31 12:39:54.975125: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:460] There are 154 ops of 34 different types in the graph that are not converted to TensorRT: Range, Sum, GreaterEqual, Where, Equal, Select, Size, Less, ConcatV2, Fill, Mul, ExpandDims, Unpack, GatherV2, NoOp, TopKV2, Cast, Slice, Transpose, Pad, Placeholder, Greater, Sub, Const, Pack, Identity, NonMaxSuppressionV3, Assert, Reshape, Squeeze, Add, Shape, Minimum, StridedSlice, (For more information see https://docs.nvidia.com/deeplearning/dgx/tf-trt-user-guide/index.html#supported-ops).
2019-10-31 12:39:55.346258: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:733] Number of TensorRT candidate segments: 12
2019-10-31 12:39:55.949408: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-10-31 12:39:56.143756: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-10-31 12:40:00.993647: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-10-31 12:41:43.617383: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:835] TensorRT node TRTEngineOp_0 added for segment 0 consisting of 789 nodes succeeded.
2019-10-31 12:41:43.745074: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:835] TensorRT node TRTEngineOp_1 added for segment 1 consisting of 22 nodes succeeded.
2019-10-31 12:41:43.772492: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:835] TensorRT node TRTEngineOp_2 added for segment 2 consisting of 3 nodes succeeded.
2019-10-31 12:41:43.804842: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:835] TensorRT node TRTEngineOp_3 added for segment 3 consisting of 3 nodes succeeded.
2019-10-31 12:41:43.833656: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:835] TensorRT node TRTEngineOp_4 added for segment 4 consisting of 3 nodes succeeded.
2019-10-31 12:41:43.863932: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:835] TensorRT node TRTEngineOp_5 added for segment 5 consisting of 3 nodes succeeded.
2019-10-31 12:41:43.892892: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:835] TensorRT node TRTEngineOp_6 added for segment 6 consisting of 3 nodes succeeded.
2019-10-31 12:41:43.922113: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:835] TensorRT node TRTEngineOp_7 added for segment 7 consisting of 3 nodes succeeded.
2019-10-31 12:41:43.970877: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:835] TensorRT node Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/Area/TRTEngineOp_8 added for segment 8 consisting of 6 nodes succeeded.
2019-10-31 12:41:44.015790: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:835] TensorRT node Postprocessor/BatchMultiClassNonMaxSuppression/TRTEngineOp_9 added for segment 9 consisting of 14 nodes succeeded.
2019-10-31 12:41:44.046634: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:835] TensorRT node Postprocessor/BatchMultiClassNonMaxSuppression/TRTEngineOp_10 added for segment 10 consisting of 3 nodes succeeded.
2019-10-31 12:41:44.071502: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:835] TensorRT node Postprocessor/TRTEngineOp_11 added for segment 11 consisting of 3 nodes succeeded.
2019-10-31 12:41:44.167981: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:739] Optimization results for grappler item: tf_graph
2019-10-31 12:41:44.168079: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:741] constant folding: Graph size after: 1154 nodes (-1517), 1265 edges (-1691), time = 3652.29395ms.
2019-10-31 12:41:44.168104: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:741] layout: Graph size after: 1169 nodes (15), 1291 edges (26), time = 173.861ms.
2019-10-31 12:41:44.168125: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:741] constant folding: Graph size after: 1169 nodes (0), 1291 edges (0), time = 156.13ms.
2019-10-31 12:41:44.168144: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:741] TensorRTOptimizer: Graph size after: 326 nodes (-843), 404 edges (-887), time = 109313.375ms.
WARNING:tensorflow:From frozen2trt2.py:30: The name tf.reset_default_graph is deprecated. Please use tf.compat.v1.reset_default_graph instead.
WARNING:tensorflow:From frozen2trt2.py:32: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
2019-10-31 12:42:34.608055: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2019-10-31 12:42:34.608273: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: NVIDIA Tegra X1 major: 5 minor: 3 memoryClockRate(GHz): 0.9216
pciBusID: 0000:00:00.0
2019-10-31 12:42:34.609062: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-10-31 12:42:34.609540: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-10-31 12:42:34.609703: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-10-31 12:42:34.609867: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-10-31 12:42:34.611511: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-10-31 12:42:34.611795: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-10-31 12:42:34.611971: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-10-31 12:42:34.612711: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2019-10-31 12:42:34.613299: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2019-10-31 12:42:34.613521: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-10-31 12:42:34.654486: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2019-10-31 12:42:34.654655: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: NVIDIA Tegra X1 major: 5 minor: 3 memoryClockRate(GHz): 0.9216
pciBusID: 0000:00:00.0
2019-10-31 12:42:34.654801: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-10-31 12:42:34.654907: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-10-31 12:42:34.654983: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-10-31 12:42:34.655043: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-10-31 12:42:34.655218: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-10-31 12:42:34.655503: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-10-31 12:42:34.655599: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-10-31 12:42:34.656073: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2019-10-31 12:42:34.657022: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2019-10-31 12:42:34.657276: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-10-31 12:42:34.657713: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 12:42:34.657801: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0
2019-10-31 12:42:34.657983: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N
2019-10-31 12:42:34.658647: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2019-10-31 12:42:34.659268: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2019-10-31 12:42:34.659432: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 664 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
{'input_filename': 'frozen_inference_graph.pb', 'output_filename': 'trt_optimized_inference_graph_FP32.pb', 'input_path': './', 'output_path': './'}
OPTIMIZING MODEL...
All nodes pre-optimization: 2671
TRT Engine opts: 12
All nodes post-optimization: 326
and it is the “better” output, with which we can perform inference in around 100-105ms
and sometimes it creates less or none TRT Engine opts, thus output is:
all nodes pre-optimization: 2671
TRT Engine opts: 11
all nodes post-optimization: 1114
and then inference time is about 210ms.
LOG:
2019-10-31 14:35:29.179125: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-10-31 14:35:36.724575: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
WARNING:tensorflow:From frozen2trt2.py:25: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.
WARNING:tensorflow:From frozen2trt2.py:26: The name tf.GraphDef is deprecated. Please use tf.compat.v1.GraphDef instead.
2019-10-31 14:35:49.326881: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2019-10-31 14:35:49.340585: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2019-10-31 14:35:49.340736: I tensorflow/core/grappler/devices.cc:55] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2019-10-31 14:35:49.341066: I tensorflow/core/grappler/clusters/single_machine.cc:359] Starting new session
2019-10-31 14:35:49.360550: W tensorflow/core/platform/profile_utils/cpu_utils.cc:98] Failed to find bogomips in /proc/cpuinfo; cannot determine CPU frequency
2019-10-31 14:35:49.361694: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x84a1d70 executing computations on platform Host. Devices:
2019-10-31 14:35:49.361765: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 14:35:49.426557: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2019-10-31 14:35:49.426854: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x9a6c9a0 executing computations on platform CUDA. Devices:
2019-10-31 14:35:49.426905: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): NVIDIA Tegra X1, Compute Capability 5.3
2019-10-31 14:35:49.427454: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2019-10-31 14:35:49.427565: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: NVIDIA Tegra X1 major: 5 minor: 3 memoryClockRate(GHz): 0.9216
pciBusID: 0000:00:00.0
2019-10-31 14:35:49.427636: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-10-31 14:35:49.427757: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-10-31 14:35:49.427848: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-10-31 14:35:49.427933: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-10-31 14:35:49.431163: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-10-31 14:35:49.433830: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-10-31 14:35:49.434029: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-10-31 14:35:49.434350: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2019-10-31 14:35:49.434659: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2019-10-31 14:35:49.434742: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-10-31 14:35:51.165608: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 14:35:51.165679: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0
2019-10-31 14:35:51.165703: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N
2019-10-31 14:35:51.166147: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2019-10-31 14:35:51.166505: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2019-10-31 14:35:51.166790: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 473 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
2019-10-31 14:36:00.568729: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:460] There are 154 ops of 34 different types in the graph that are not converted to TensorRT: Range, Sum, GreaterEqual, Where, Equal, Select, Size, Less, ConcatV2, Fill, Mul, ExpandDims, Unpack, GatherV2, NoOp, TopKV2, Cast, Slice, Transpose, Pad, Placeholder, Greater, Sub, Const, Pack, Identity, NonMaxSuppressionV3, Assert, Reshape, Squeeze, Add, Shape, Minimum, StridedSlice, (For more information see https://docs.nvidia.com/deeplearning/dgx/tf-trt-user-guide/index.html#supported-ops).
2019-10-31 14:36:00.940995: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:733] Number of TensorRT candidate segments: 12
2019-10-31 14:36:01.563045: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-10-31 14:36:01.774501: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-10-31 14:36:04.819149: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-10-31 14:36:18.661221: W tensorflow/core/common_runtime/bfc_allocator.cc:314] Allocator (GPU_0_bfc) ran out of memory trying to allocate 472.48MiB (rounded to 495428864). Current allocation summary follows.
2019-10-31 14:36:18.661887: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (256): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-10-31 14:36:18.662077: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (512): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-10-31 14:36:18.662229: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (1024): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-10-31 14:36:18.662386: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (2048): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-10-31 14:36:18.662532: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (4096): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-10-31 14:36:18.662674: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (8192): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-10-31 14:36:18.662819: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (16384): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-10-31 14:36:18.662962: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (32768): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-10-31 14:36:18.663111: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (65536): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-10-31 14:36:18.663259: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (131072): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-10-31 14:36:18.663409: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (262144): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-10-31 14:36:18.663561: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (524288): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-10-31 14:36:18.663707: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (1048576): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-10-31 14:36:18.663854: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (2097152): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-10-31 14:36:18.664029: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (4194304): Total Chunks: 2, Chunks in use: 2. 8.24MiB allocated for chunks. 8.24MiB in use in bin. 8.24MiB client-requested in use in bin.
2019-10-31 14:36:18.664188: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (8388608): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-10-31 14:36:18.664329: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (16777216): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-10-31 14:36:18.664729: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (33554432): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-10-31 14:36:18.664881: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (67108864): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-10-31 14:36:18.665022: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (134217728): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-10-31 14:36:18.665191: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (268435456): Total Chunks: 1, Chunks in use: 0. 464.77MiB allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-10-31 14:36:18.665377: I tensorflow/core/common_runtime/bfc_allocator.cc:780] Bin for 472.48MiB was 256.00MiB, Chunk State:
2019-10-31 14:36:18.665653: I tensorflow/core/common_runtime/bfc_allocator.cc:786] Size: 464.77MiB | Requested Size: 0B | in_use: 0 | bin_num: 20, prev: Size: 4.12MiB | Requested Size: 4.12MiB | in_use: 1 | bin_num: -1
2019-10-31 14:36:18.665803: I tensorflow/core/common_runtime/bfc_allocator.cc:793] Next region of size 495988736
2019-10-31 14:36:18.665974: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0xf00e50000 next 1 of size 4320768
2019-10-31 14:36:18.666117: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0xf0126ee00 next 2 of size 4320768
2019-10-31 14:36:18.666244: I tensorflow/core/common_runtime/bfc_allocator.cc:800] Free at 0xf0168dc00 next 18446744073709551615 of size 487347200
2019-10-31 14:36:18.666360: I tensorflow/core/common_runtime/bfc_allocator.cc:809] Summary of in-use Chunks by size:
2019-10-31 14:36:18.666516: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 2 Chunks of size 4320768 totalling 8.24MiB
2019-10-31 14:36:18.666602: I tensorflow/core/common_runtime/bfc_allocator.cc:816] Sum Total of in-use chunks: 8.24MiB
2019-10-31 14:36:18.666665: I tensorflow/core/common_runtime/bfc_allocator.cc:818] total_region_allocated_bytes_: 495988736 memory_limit_: 495988736 available bytes: 0 curr_region_allocation_bytes_: 991977472
2019-10-31 14:36:18.666758: I tensorflow/core/common_runtime/bfc_allocator.cc:824] Stats:
Limit: 495988736
InUse: 8641536
MaxInUse: 8641536
NumAllocs: 2
MaxAllocSize: 4320768
2019-10-31 14:36:18.666833: W tensorflow/core/common_runtime/bfc_allocator.cc:319] **__________________________________________________________________________________________________
2019-10-31 14:36:18.667081: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:41] DefaultLogger resources.h (154) - OutOfMemory Error in GpuMemory: 0
2019-10-31 14:36:18.699826: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:41] DefaultLogger GPU memory allocation failed during tactic selection for layer: (Unnamed Layer* 0) [Scale] + (Unnamed Layer* 1) [Scale]
2019-10-31 14:36:18.701891: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:41] DefaultLogger resources.h (154) - OutOfMemory Error in GpuMemory: 0
2019-10-31 14:36:18.705741: W tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:838] TensorRT node TRTEngineOp_0 added for segment 0 consisting of 789 nodes failed: Internal: Failed to build TensorRT engine. Fallback to TF...
2019-10-31 14:36:18.973018: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:835] TensorRT node TRTEngineOp_1 added for segment 1 consisting of 22 nodes succeeded.
2019-10-31 14:36:18.996189: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:835] TensorRT node TRTEngineOp_2 added for segment 2 consisting of 3 nodes succeeded.
2019-10-31 14:36:19.018723: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:835] TensorRT node TRTEngineOp_3 added for segment 3 consisting of 3 nodes succeeded.
2019-10-31 14:36:19.042173: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:835] TensorRT node TRTEngineOp_4 added for segment 4 consisting of 3 nodes succeeded.
2019-10-31 14:36:19.061488: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:835] TensorRT node TRTEngineOp_5 added for segment 5 consisting of 3 nodes succeeded.
2019-10-31 14:36:19.082478: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:835] TensorRT node TRTEngineOp_6 added for segment 6 consisting of 3 nodes succeeded.
2019-10-31 14:36:19.105822: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:835] TensorRT node TRTEngineOp_7 added for segment 7 consisting of 3 nodes succeeded.
2019-10-31 14:36:19.161703: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:835] TensorRT node Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/Area/TRTEngineOp_8 added for segment 8 consisting of 6 nodes succeeded.
2019-10-31 14:36:19.210589: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:835] TensorRT node Postprocessor/BatchMultiClassNonMaxSuppression/TRTEngineOp_9 added for segment 9 consisting of 14 nodes succeeded.
2019-10-31 14:36:19.239720: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:835] TensorRT node Postprocessor/BatchMultiClassNonMaxSuppression/TRTEngineOp_10 added for segment 10 consisting of 3 nodes succeeded.
2019-10-31 14:36:19.253105: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:835] TensorRT node Postprocessor/TRTEngineOp_11 added for segment 11 consisting of 3 nodes succeeded.
2019-10-31 14:36:19.352321: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:739] Optimization results for grappler item: tf_graph
2019-10-31 14:36:19.352451: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:741] constant folding: Graph size after: 1154 nodes (-1517), 1265 edges (-1691), time = 3664.3291ms.
2019-10-31 14:36:19.352479: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:741] layout: Graph size after: 1169 nodes (15), 1291 edges (26), time = 176.661ms.
2019-10-31 14:36:19.352501: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:741] constant folding: Graph size after: 1169 nodes (0), 1291 edges (0), time = 155.807ms.
2019-10-31 14:36:19.352520: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:741] TensorRTOptimizer: Graph size after: 1114 nodes (-55), 1223 edges (-68), time = 18881.2871ms.
WARNING:tensorflow:From frozen2trt2.py:31: The name tf.reset_default_graph is deprecated. Please use tf.compat.v1.reset_default_graph instead.
WARNING:tensorflow:From frozen2trt2.py:33: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
2019-10-31 14:37:59.829028: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2019-10-31 14:37:59.829238: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: NVIDIA Tegra X1 major: 5 minor: 3 memoryClockRate(GHz): 0.9216
pciBusID: 0000:00:00.0
2019-10-31 14:37:59.860096: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-10-31 14:37:59.860362: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-10-31 14:37:59.860547: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-10-31 14:37:59.860628: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-10-31 14:37:59.861505: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-10-31 14:37:59.861673: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-10-31 14:37:59.887050: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-10-31 14:37:59.887439: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2019-10-31 14:37:59.887804: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2019-10-31 14:37:59.887915: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-10-31 14:37:59.937706: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2019-10-31 14:37:59.938017: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: NVIDIA Tegra X1 major: 5 minor: 3 memoryClockRate(GHz): 0.9216
pciBusID: 0000:00:00.0
2019-10-31 14:37:59.938167: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-10-31 14:37:59.938335: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-10-31 14:37:59.938454: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-10-31 14:37:59.938563: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-10-31 14:37:59.938821: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-10-31 14:37:59.939007: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-10-31 14:37:59.939129: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-10-31 14:37:59.939702: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2019-10-31 14:37:59.940466: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2019-10-31 14:37:59.940628: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-10-31 14:37:59.940791: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 14:37:59.940838: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0
2019-10-31 14:37:59.940876: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N
2019-10-31 14:37:59.941419: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2019-10-31 14:37:59.942080: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2019-10-31 14:37:59.942348: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 473 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
{'input_filename': 'frozen_inference_graph.pb', 'output_filename': 'trt_optimized_inference_graph_FP32.pb', 'input_path': './', 'output_path': './'}
OPTIMIZING MODEL...
All nodes pre-optimization: 2671
TRT Engine opts: 11
All nodes post-optimization: 1114
The inference is performed on images 600x600x3.
Questions:
What causes this kind of behaviour?
How to properly optimize tensorflow trained model with tensorRT for inference on Jetson Nano?
What is the minimum inference time for ssd_mobilenet_v2 with images of this size?