Segfault while invoking inference in TFLite model on JetsonNano

Hi, i’ve installed TensorFlow v2.5.0+nv21.6 on my JetsonNano using the following guide Installing TensorFlow For Jetson Platform :: NVIDIA Deep Learning Frameworks Documentation (replacing v46 with v45). The problem appears when i try to invoke inference after loading the TFLite Interpreter on the Jetson Nano:

Predicting with TensorFlowLite model
INFO: Created TensorFlow Lite delegate for select TF ops.
2022-01-31 20:33:10.112306: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2022-01-31 20:33:10.112463: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1734] Found device 0 with properties: pciBusID: 0000:00:00.0 name: NVIDIA Tegra X1 computeCapability: 5.3
coreClock: 0.9216GHz coreCount: 1 deviceMemorySize: 3.87GiB deviceMemoryBandwidth: 194.55MiB/s
2022-01-31 20:33:10.112695: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2022-01-31 20:33:10.112906: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2022-01-31 20:33:10.112978: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1872] Adding visible gpu devices: 0
2022-01-31 20:33:10.113055: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-01-31 20:33:10.113094: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0
2022-01-31 20:33:10.113125: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N
2022-01-31 20:33:10.113333: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2022-01-31 20:33:10.113560: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2022-01-31 20:33:10.113666: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 242 MB memory) → physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
INFO: TfLiteFlexDelegate delegate: 4 nodes delegated out of 12 nodes with 3 partitions.
INFO: TfLiteFlexDelegate delegate: 0 nodes delegated out of 1 nodes with 0 partitions.
INFO: TfLiteFlexDelegate delegate: 2 nodes delegated out of 8 nodes with 2 partitions.
Segmentation fault (core dumped)

Using:

Python 3.6.9
Numpy v1.19.5
JetPack 4.5.1
CUDA 10.2.89
CUDNN: 8.0.0.180
TensorFlow v2.5.0+nv21.6
Model architectures: LSTM and Echo State Network

Here’s the code used to invoke inference:

#load TFLite interpreter
interpreter = tf.lite.Interpreter(model_path=self.model_path)
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

#simulate data arriving in batches, predict each batch
for i in range(0, num_batches + 1):
prior_idx = i * self.config.batch_size
idx = (i + 1) * self.config.batch_size

#resize input tensor to X_test.shape size
interpreter.resize_tensor_input(input_details[0]['index'], [idx - prior_idx, channel.X_test.shape[1], channel.X_test.shape[2]])

interpreter.allocate_tensors()
X_test_batch = channel.X_test[prior_idx:idx]

interpreter.set_tensor(interpreter.get_input_details()[0]['index'], X_test_batch)
interpreter.invoke()

I put some prints, the model is loaded correctly (input_details and output_details give me the correct output) and seems the problem is in the invoke() method.

I also tried to replicate this issue on my Windows 11 laptop using TF v2.5.0 (the same used on the Jetson) with Python 3.9 and there inference was running without any problems, so it seems the problem is not in the code but in the Jetson Nano. How can i do?

Thanks!

Sorry for the late response, is this still an issue to support? Thanks

Hi,

Segmentation fault (core dumped) is usually caused by some invalid memory access.

To give a further suggestion, we need to reproduce this internally.
Would you mind also sharing the TFLite model with us?

Thanks.

Model:
P-1.tflite (37.9 KB)

[EDIT]
Here’s the reshaped input. The segfault comes already in the first batch, so you can assume self.config.batch_size = 70and num_batches = 114.

Input data:

Thanks

Hi,

Thanks for the model and input data.
Confirm that we can reproduce this error in our environment as well.

We are checking this internally.
Will share more information with you later.

Thanks.

Hi,

Any update with this issue please ?