Tensorflow model doesn't produce results on Jetson Xavier but does on local computer

Hello,

I am trying to run an image processing model on the Jetson Xavier which is running jetpack 4.3. I am running this process inside of a docker container. For some reason, when the code is run on my local computer and using the tensorflow/tensorflow base image it produces results from my test image. However, when using the nvcr.io/nvidia/l4t-tensorflow:r32.4.3-tf2.2-py3 image and running on the Jetson it will fail to create inference results and return nan where the predictions should be.

I cannot run tensorflow/tensorflow on the Jetson or nvcr.io/nvidia/l4t-tensorflow:r32.4.3-tf2.2-py3 locally because of underlying hardware differences on my computer/theJetson.

If anyone has any idea what could be causing this issue please let me know. I can provide more details if you think they are relevant. Thanks in advance!

In an attempt to diagnose the issue I have checked the differences in both docker environments using pip list.

Differences Computer/Jetson library versions

absl-py____________ 0.10.0/0.9.0
google-auth________1.22.0/1.18.0
grpcio_____________1.32.0/1.30.0
idna_______________2.6/2.10
importlib-metadata__2.0.0/1.7.0
opt-einsum_________ 3.3.0/3.2.1
pip________________20.2.4/20.0.2
scipy______________1.5.3/1.4.1 *
setuptools__________50.3.0/47.3.1
tensorboard________ 2.3.0/2.2.2 *
tensorflow__________2.3.1/2.2.0+nv20.6 *
tensorflow-estimator_2.3.0/2.2.0 *
urllib3_____________1.25.10/1.25.9
zipp_______________3.2.0/3.1.0

*cannot be changed on the Jetson as it is part of the base image

  • I used pip to update/downgrade all of the libraries on the Jetson execept those marked with the * in the above list to match the other docker env. This had no effect on my output.

  • In order to trouble shoot I used pip install tensorflow==2.2.0 on the docker container my computer to see if that would break the inference. It did not. I also retrained the model on my computer using tensorflow==2.2.0 this time and it still worked on my laptop but not on the Jetson.

  • Both the Jetson and the docker instance on my laptop are using python==3.6.9.

  • The input image is the same for both platforms and I am performing the same test from inside the docker containers.

  • The model is loading in because when I print it I get: <tensorflow.python.saved_model.load.Loader._recreate_base_user_object.<locals>._UserObject object at 0x7efb8597f0>

  • The images are opening and reading correctly. I printed out their numpy arrays and they look reasonable.

  • The model is returning the output dictionary but it has nan where the predictions should be.

  • The model is not optimized with TensorRT

Model Inference Code Block:

image = np.asarray(image)
input_tensor = tf.convert_to_tensor(image)

The model expects a batch of images, so add an axis with tf.newaxis.

input_tensor = input_tensor[tf.newaxis,…]

Run inference

model_fn = model.signatures[‘serving_default’]
output_dict = model_fn(input_tensor)

Random things I have noticed but probably aren’t relevant:
Running this takes forever. ~7min:

tf.keras.backend.clear_session()
model = tf.saved_model.load(PATH)

It also produces a warning that I had read is a tf bug caused by training my own model so I don’t think is the issue.

WARNING:tensorflow:Importing a function (__inference_EfficientDet-D0_layer_call_and_return_conditional_losses_95612) with ops with custom gradients. Will likely fail if a gradient is requested.
WARNING:tensorflow:Importing a function (__inference_EfficientDet-D0_layer_call_and_return_conditional_losses_82391) with ops with custom gradients. Will likely fail if a gradient is requested.

Every time I use the nvcr.io/nvidia/l4t-tensorflow:r32.4.3-tf2.2-py3 image I get the Tensorflow Can't find Cuda error. It is possible I just need to mount something onto my docker file. I don’t think this is my issue because I had a previous premade tf model running just fine on the image with this error.

Sorry for the long post. Just wanted to get all the info out. :)

Hi,

Please note that JetPack4.3 is using rel-32.3 rather than rel-32.4.3.

You will need to use the compatible OS and docker image.
Please upgrade your device into JetPack4.4 and use r32.4.3-tf2.2-py3 image.

Thanks.

Thank you for the response!

Is there an older version of the image I am trying to use that I could use with JetPack 4.3? I already have some other concurrent programs running well on the Jetson and I don’t want to break 3 things trying to fix 1 if I can help it.

Thank you again for the quick feedback!

Hi,

YES. Here are some packages for JetPack4.3.

https://developer.download.nvidia.com/compute/redist/jp/v43/tensorflow/

You can try tensorflow-2.1.0+nv20.3 which should be the latest package for JetPack4.3.

Thanks.