Jetson TX2 Tensorrt l4t-tensorflow NGC Segmentation fault at build trt graphconverterV2

Description

Hi, I have a Jetson TX2 that I would like to use for online inference. To speed up the inference, as the GPU turned out to slower than the CPU, I decided to use the TensorRT package. I am able to convert and save the trt model but when I try to build it I get a segmentation fault.
I have changed the amount of memory available to the GPU, increased and decreased the max_workspace_size_bytes for the trt graph and much more without any success.

I was initially using nvcr.io/nvidia/l4t-tensorflow:r32.7.1-tf2.7-py3 but had to use nvcr.io/nvidia/l4t-tensorflow:r32.6.1-tf2.5-py3 to avoid an issue similar to this post: TF2.0: Translation model: Error when restoring the saved model: Unresolved object in checkpoint (root).optimizer.iter: attributes · Issue #33150 · tensorflow/tensorflow · GitHub

Do you have any idea of how I could fix this segmentation fault or suggestions of how to convert my tensorflow model to a ? The error seems to come from the tensorrt package during the build process according to faulthandler. Perhaps worth noting is that setting allow_build_at_runtime=False does not return a segmentation fault but then I am not able to use the model for inference later on because of the missing input_func. I will be happy to provide you with more information if need be.

Relevant files

nvidia.py (1.7 KB)

Setup

Jetson TX2 flashed with Jetpack 4.6.3 SDK manager from Ubuntu 18.04 (Versions 4.5 and 4.5.1 did not work because CUDA failed to be installed and other 4.6 versions results in the same error)

Steps to reproduce on the Jetson TX2:

In the code I used faulthandler to provide some extra information.

$ sudo docker run -it --rm --runtime nvidia --network host -v /path-to-nvidia.py:/code -w /code nvcr.io/nvidia/l4t-tensorflow:r32.6.1-tf2.5-py3

$ python3
root@ubuntu:/code# python3 nvidia.py
2023-04-20 09:36:13.407824: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.10.2
Tensorflow version: 2.5.0
Protobuf version: 3.17.3
TensorRT version:
2.5.0
GPU available:
2023-04-20 09:36:17.903609: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2023-04-20 09:36:17.910265: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2023-04-20 09:36:17.910481: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1734] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: NVIDIA Tegra X2 computeCapability: 6.2
coreClock: 1.3GHz coreCount: 2 deviceMemorySize: 7.67GiB deviceMemoryBandwidth: 38.74GiB/s
2023-04-20 09:36:17.910576: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.10.2
2023-04-20 09:36:17.931954: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.10
2023-04-20 09:36:17.932164: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.10
2023-04-20 09:36:17.948369: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2023-04-20 09:36:17.961329: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2023-04-20 09:36:17.994643: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.10
2023-04-20 09:36:18.010569: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.10
2023-04-20 09:36:18.012532: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2023-04-20 09:36:18.013268: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2023-04-20 09:36:18.013851: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2023-04-20 09:36:18.014166: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1872] Adding visible gpu devices: 0
[PhysicalDevice(name=‘/physical_device:GPU:0’, device_type=‘GPU’)]
2023-04-20 09:36:18.112677: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2023-04-20 09:36:18.112858: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1734] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: NVIDIA Tegra X2 computeCapability: 6.2
coreClock: 1.3GHz coreCount: 2 deviceMemorySize: 7.67GiB deviceMemoryBandwidth: 38.74GiB/s
2023-04-20 09:36:18.113117: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2023-04-20 09:36:18.113326: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2023-04-20 09:36:18.113409: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1872] Adding visible gpu devices: 0
2023-04-20 09:36:18.113531: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.10.2
2023-04-20 09:36:20.751485: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2023-04-20 09:36:20.751565: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0
2023-04-20 09:36:20.751595: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N
2023-04-20 09:36:20.751898: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2023-04-20 09:36:20.752180: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2023-04-20 09:36:20.752402: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2023-04-20 09:36:20.752582: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1343 MB memory) → physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet50_weights_tf_dim_ordering_tf_kernels.h5
102973440/102967424 [==============================] - 40s 0us/step
WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. model.compile_metrics will be empty until you train or evaluate the model.
2023-04-20 09:37:55.540769: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/utils/generic_utils.py:497: CustomMaskWarning: Custom mask layers require a config and must override get_config. When loading, the custom mask layer must be passed to the custom_objects argument.
category=CustomMaskWarning)
2023-04-20 09:39:47.635907: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libnvinfer.so.8
2023-04-20 09:41:23.880338: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2023-04-20 09:41:23.880515: I tensorflow/core/grappler/devices.cc:69] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2023-04-20 09:41:23.880787: I tensorflow/core/grappler/clusters/single_machine.cc:357] Starting new session
2023-04-20 09:41:23.881711: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2023-04-20 09:41:23.881875: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1734] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: NVIDIA Tegra X2 computeCapability: 6.2
coreClock: 1.3GHz coreCount: 2 deviceMemorySize: 7.67GiB deviceMemoryBandwidth: 38.74GiB/s
2023-04-20 09:41:23.882052: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2023-04-20 09:41:23.882244: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2023-04-20 09:41:23.882329: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1872] Adding visible gpu devices: 0
2023-04-20 09:41:23.882410: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2023-04-20 09:41:23.882446: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0
2023-04-20 09:41:23.882470: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N
2023-04-20 09:41:23.882783: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2023-04-20 09:41:23.883326: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2023-04-20 09:41:23.883621: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1343 MB memory) → physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
2023-04-20 09:41:23.884823: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 31250000 Hz
2023-04-20 09:41:24.703952: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1171] Optimization results for grappler item: graph_to_optimize
function_optimizer: Graph size after: 1253 nodes (930), 1908 edges (1585), time = 109.917ms.
function_optimizer: function_optimizer did nothing. time = 2.39ms.

2023-04-20 09:41:38.279937: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2023-04-20 09:41:38.280255: I tensorflow/core/grappler/devices.cc:69] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2023-04-20 09:41:38.280505: I tensorflow/core/grappler/clusters/single_machine.cc:357] Starting new session
2023-04-20 09:41:38.281231: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2023-04-20 09:41:38.281385: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1734] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: NVIDIA Tegra X2 computeCapability: 6.2
coreClock: 1.3GHz coreCount: 2 deviceMemorySize: 7.67GiB deviceMemoryBandwidth: 38.74GiB/s
2023-04-20 09:41:38.281566: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2023-04-20 09:41:38.281742: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2023-04-20 09:41:38.281832: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1872] Adding visible gpu devices: 0
2023-04-20 09:41:38.281935: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2023-04-20 09:41:38.281969: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0
2023-04-20 09:41:38.281999: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N
2023-04-20 09:41:38.282271: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2023-04-20 09:41:38.282737: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2023-04-20 09:41:38.283015: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1343 MB memory) → physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
2023-04-20 09:41:41.818529: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:790] There are 5 ops of 3 different types in the graph that are not converted to TensorRT: Identity, NoOp, Placeholder, (For more information see Accelerating Inference in TensorFlow with TensorRT User Guide - NVIDIA Docs).
2023-04-20 09:41:42.019348: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:759] Number of TensorRT candidate segments: 1
2023-04-20 09:41:42.415148: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:853] Replaced segment 0 consisting of 507 nodes by TRTEngineOp_0_0.
2023-04-20 09:41:43.850847: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1171] Optimization results for grappler item: tf_graph
constant_folding: Graph size after: 560 nodes (-640), 1215 edges (-640), time = 1253.25195ms.
layout: Graph size after: 564 nodes (4), 1219 edges (4), time = 379.036ms.
constant_folding: Graph size after: 562 nodes (-2), 1217 edges (-2), time = 269.599ms.
TensorRTOptimizer: Graph size after: 56 nodes (-506), 55 edges (-1162), time = 1036.61597ms.
constant_folding: Graph size after: 56 nodes (0), 55 edges (0), time = 9.426ms.
Optimization results for grappler item: TRTEngineOp_0_0_native_segment
constant_folding: Graph size after: 509 nodes (0), 792 edges (0), time = 165.089ms.
layout: Graph size after: 509 nodes (0), 792 edges (0), time = 302.259ms.
constant_folding: Graph size after: 509 nodes (0), 792 edges (0), time = 193.194ms.
TensorRTOptimizer: Graph size after: 509 nodes (0), 792 edges (0), time = 35.701ms.
constant_folding: Graph size after: 509 nodes (0), 792 edges (0), time = 170.495ms.

2023-04-20 10:02:36.380173: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2023-04-20 10:02:37.868305: I tensorflow/compiler/tf2tensorrt/common/utils.cc:58] Linked TensorRT version: 8.0.1
2023-04-20 10:02:37.978132: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libnvinfer.so.8
2023-04-20 10:02:37.993360: I tensorflow/compiler/tf2tensorrt/common/utils.cc:60] Loaded TensorRT version: 8.2.1
2023-04-20 10:02:38.094440: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libnvinfer_plugin.so.8
Fatal Python error: Segmentation fault

Thread 0x0000007f7d3cd010 (most recent call first):
File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py”, line 60 in quick_execute
File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py”, line 596 in call
File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py”, line 1961 in _call_flat
File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py”, line 1778 in _call_with_flat_signature
File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py”, line 1729 in _call_impl
File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/wrap_function.py”, line 247 in _call_impl
File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py”, line 1711 in call
File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/compiler/tensorrt/trt_convert.py”, line 1200 in build
File “nvidia.py”, line 48 in
Segmentation fault (core dumped)

Hi

The error seems related to the library compatibility.

...
 Successfully opened dynamic library libnvinfer_plugin.so.8
Fatal Python error: Segmentation fault
...

Since there are some dependencies between the library and the driver.
If l4t-tensorflow:r32.6.1-tf2.5-py3 is required, could you set up your environment to r32.6.1 to see if it helps?

Thanks.

Dear AastaLLL,

I have tried it but I got the same error. I also tried to install Jetpack version 4.5 and 4.5.1 but both of them returned “Installation failed” because CUDA was not able to be installed. I can also verify that it is not Resnet50 that is the problem. I have tried different models and have still gotten the “Segmentation Fault” at the end.

Do you have any other ideas? Thanks for the quick reply!

Solution
Use Jetpack version 4.6 (rev 3)

More information
Apparently the version of jetpack was the main issue. I had tried 4.6.3, 4.6.2, 4.6.1, 4.5.1 (rev 1) and 4.5 but I had not tried 4.6 (rev 3). 4.6 (rev 3) solved the segmentation fault and I am now able to build the tft-rt model and perform inference using the converted and built model.

One difference I found was this:
Using e.g. 4.6.3 the linked Tensorrt package has version 8.0.1 but the loaded tensorrt package version 8.2.1 as seen in the output above.
tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2023-04-20 10:02:37.868305: I tensorflow/compiler/tf2tensorrt/common/utils.cc:58] Linked TensorRT version: 8.0.1
2023-04-20 10:02:37.978132: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libnvinfer.so.8
2023-04-20 10:02:37.993360: I tensorflow/compiler/tf2tensorrt/common/utils.cc:60] Loaded TensorRT version: 8.2.1

However with 4.6 (rev 3) the loaded model also has version
8.0.1

You may close this thread but I think this should be investigated further. Thanks for your help!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.