I met tensorflow_xla crash issue for model inference on Nvidia Jetson AGX Xavier aarch64 system.
- Have I written custom code: Yes, I have some CPU computing custom ops, they are in different places in the middle of the network
- OS Platform and Distribution): Linux Ubuntu18.04
- device: Nvidia Jetson AGX Xavier, jetpack 4.4
- TensorFlow installed from (source or binary): tensorflow2.4.1 source code compiled on xavier
- TensorFlow version: tensorflow-2.4.1
- Python version: 3.6.8
- Bazel version (if compiling from source): 3.1.0
- GCC/Compiler version (if compiling from source): 7.5.0
- CUDA/cuDNN version: cuda10.2, cudnn8.0
1, When do model inference with xla enable, this crash can be reproduced almost every time (xavier aarch64 system)
2, When I turn off some of custom ops (CPU compute op), crash can happen about 7 times after 10 runs (xavier aarch64 system)
3, When I run same code, same tensorflow-2.4.1 version on V100 GPU and x86 system, it can run successful without any crash (x86 + v100 gpu system)
Is it possible to close all the custom ops?
If yes, could you give it a try?
After close all the custom ops, crash issue still exists. Crashed 6 times after 20 runs.
After disable xla, tensorflow model can run successful with crash, but the latency is bad.
We want to reproduce this in our environment to have a look.
Could you share the detailed steps and resource to reproduce this?
On Jetpack 4.4 (cuda 10.2, cudnn 8.0, tensorrt 126.96.36.199, Tensorflow 2.4.1)
1, The code can not share because of Trade secret, sorry for that!
2, The custom op is common pixel level computing method, some if/else logic in it so we didn’t code it by CUDA kernel.
3, After close xla compilation for all the custom ops, crash issue still exists, crashed 6 times after 20 runs.
4, After enable xla compilation for the whole network, crashed 20 times after 20 runs.
5, After disable xla for the whole network, prue tensorflow model can run successful without crash
have also tested Jetpack 4.3 on Jetson AGX Xavier, find there is no crash issue on Jetpack4.3 (Tensorflow 2.2)(Jetpack 4.3 Archive | NVIDIA Developer),
On Jetpack 4.3 (cuda 10.0, cudnn 7.6.3, Tensorrt 188.8.131.52, Tensorflow 2.2)
1, After enable xla for the whole network, tensorflow xla model can run successful without crash, the latency is good.
After compare the Jetpack4.4 with Tensorflow 2.4 and Jetpack4.3 with Tensorflow2.2, found new latency slow issue for op of tf.math.unsorted_segment_max:
Jetpack4.3 with tensorflow 2.2: latency of “tf.math.unsorted_segment_max” op on Jetson AGX Xavier: 2~3ms
Jetpack4.3 with tensorflow 2.4: latency of “tf.math.unsorted_segment_max” op on Jetson AGX Xavier: 120ms
You don’t need to share the whole source.
Just extract a tiny reproducible source can help a lot.
Not sure if we understand your question correctly.
It seems that TensorFlow XLA compiles source for CPU.
But the main difference between JetPack4.3 and 4.4 is the GPU related library, not related to CPU.