Tensorflow model doesn't produce results on Jetson Xavier but does on local computer

altimm · October 22, 2020, 9:35pm

Hello,

I am trying to run an image processing model on the Jetson Xavier which is running jetpack 4.3. I am running this process inside of a docker container. For some reason, when the code is run on my local computer and using the tensorflow/tensorflow base image it produces results from my test image. However, when using the nvcr.io/nvidia/l4t-tensorflow:r32.4.3-tf2.2-py3 image and running on the Jetson it will fail to create inference results and return nan where the predictions should be.

I cannot run tensorflow/tensorflow on the Jetson or nvcr.io/nvidia/l4t-tensorflow:r32.4.3-tf2.2-py3 locally because of underlying hardware differences on my computer/theJetson.

If anyone has any idea what could be causing this issue please let me know. I can provide more details if you think they are relevant. Thanks in advance!

In an attempt to diagnose the issue I have checked the differences in both docker environments using pip list.

Differences Computer/Jetson library versions

absl-py____________ 0.10.0/0.9.0
google-auth________1.22.0/1.18.0
grpcio_____________1.32.0/1.30.0
idna_______________2.6/2.10
importlib-metadata__2.0.0/1.7.0
opt-einsum_________ 3.3.0/3.2.1
pip________________20.2.4/20.0.2
scipy______________1.5.3/1.4.1 *
setuptools__________50.3.0/47.3.1
tensorboard________ 2.3.0/2.2.2 *
tensorflow__________2.3.1/2.2.0+nv20.6 *
tensorflow-estimator_2.3.0/2.2.0 *
urllib3_____________1.25.10/1.25.9
zipp_______________3.2.0/3.1.0

*cannot be changed on the Jetson as it is part of the base image

I used pip to update/downgrade all of the libraries on the Jetson execept those marked with the * in the above list to match the other docker env. This had no effect on my output.
In order to trouble shoot I used pip install tensorflow==2.2.0 on the docker container my computer to see if that would break the inference. It did not. I also retrained the model on my computer using tensorflow==2.2.0 this time and it still worked on my laptop but not on the Jetson.
Both the Jetson and the docker instance on my laptop are using python==3.6.9.
The input image is the same for both platforms and I am performing the same test from inside the docker containers.
The model is loading in because when I print it I get: <tensorflow.python.saved_model.load.Loader._recreate_base_user_object.<locals>._UserObject object at 0x7efb8597f0>
The images are opening and reading correctly. I printed out their numpy arrays and they look reasonable.
The model is returning the output dictionary but it has nan where the predictions should be.
The model is not optimized with TensorRT

Model Inference Code Block:

image = np.asarray(image)
input_tensor = tf.convert_to_tensor(image)

The model expects a batch of images, so add an axis with tf.newaxis.

input_tensor = input_tensor[tf.newaxis,…]

Run inference

model_fn = model.signatures[‘serving_default’]
output_dict = model_fn(input_tensor)

Random things I have noticed but probably aren’t relevant:
Running this takes forever. ~7min:

tf.keras.backend.clear_session()
model = tf.saved_model.load(PATH)

It also produces a warning that I had read is a tf bug caused by training my own model so I don’t think is the issue.

WARNING:tensorflow:Importing a function (__inference_EfficientDet-D0_layer_call_and_return_conditional_losses_95612) with ops with custom gradients. Will likely fail if a gradient is requested.
WARNING:tensorflow:Importing a function (__inference_EfficientDet-D0_layer_call_and_return_conditional_losses_82391) with ops with custom gradients. Will likely fail if a gradient is requested.

Every time I use the nvcr.io/nvidia/l4t-tensorflow:r32.4.3-tf2.2-py3 image I get the Tensorflow Can't find Cuda error. It is possible I just need to mount something onto my docker file. I don’t think this is my issue because I had a previous premade tf model running just fine on the image with this error.

Sorry for the long post. Just wanted to get all the info out. :)

AastaLLL · October 23, 2020, 9:08am

Hi,

Please note that JetPack4.3 is using rel-32.3 rather than rel-32.4.3.

You will need to use the compatible OS and docker image.
Please upgrade your device into JetPack4.4 and use r32.4.3-tf2.2-py3 image.

Thanks.

altimm · October 23, 2020, 1:13pm

Thank you for the response!

Is there an older version of the image I am trying to use that I could use with JetPack 4.3? I already have some other concurrent programs running well on the Jetson and I don’t want to break 3 things trying to fix 1 if I can help it.

Thank you again for the quick feedback!

AastaLLL · October 27, 2020, 4:34am

Hi,

YES. Here are some packages for JetPack4.3.

https://developer.download.nvidia.com/compute/redist/jp/v43/tensorflow/

You can try tensorflow-2.1.0+nv20.3 which should be the latest package for JetPack4.3.

Thanks.

Tensorflow model doesn't produce results on Jetson Xavier but does on local computer

The model expects a batch of images, so add an axis with tf.newaxis.

Run inference

The model expects a batch of images, so add an axis with `tf.newaxis`.