I am trying to run an image processing model on the Jetson Xavier which is running
jetpack 4.3. I am running this process inside of a docker container. For some reason, when the code is run on my local computer and using the
tensorflow/tensorflow base image it produces results from my test image. However, when using the
nvcr.io/nvidia/l4t-tensorflow:r32.4.3-tf2.2-py3 image and running on the Jetson it will fail to create inference results and return
nan where the predictions should be.
I cannot run
tensorflow/tensorflow on the Jetson or
nvcr.io/nvidia/l4t-tensorflow:r32.4.3-tf2.2-py3 locally because of underlying hardware differences on my computer/theJetson.
If anyone has any idea what could be causing this issue please let me know. I can provide more details if you think they are relevant. Thanks in advance!
In an attempt to diagnose the issue I have checked the differences in both docker environments using
Differences Computer/Jetson library versions
tensorboard________ 2.3.0/2.2.2 *
*cannot be changed on the Jetson as it is part of the base image
I used pip to update/downgrade all of the libraries on the Jetson execept those marked with the * in the above list to match the other docker env. This had no effect on my output.
In order to trouble shoot I used
pip install tensorflow==2.2.0on the docker container my computer to see if that would break the inference. It did not. I also retrained the model on my computer using
tensorflow==2.2.0this time and it still worked on my laptop but not on the Jetson.
Both the Jetson and the docker instance on my laptop are using
The input image is the same for both platforms and I am performing the same test from inside the docker containers.
The model is loading in because when I print it I get:
<tensorflow.python.saved_model.load.Loader._recreate_base_user_object.<locals>._UserObject object at 0x7efb8597f0>
The images are opening and reading correctly. I printed out their numpy arrays and they look reasonable.
The model is returning the output dictionary but it has
nanwhere the predictions should be.
The model is not optimized with
Model Inference Code Block:
image = np.asarray(image)
input_tensor = tf.convert_to_tensor(image)
The model expects a batch of images, so add an axis with
input_tensor = input_tensor[tf.newaxis,…]
model_fn = model.signatures[‘serving_default’]
output_dict = model_fn(input_tensor)
Random things I have noticed but probably aren’t relevant:
Running this takes forever. ~7min:
model = tf.saved_model.load(PATH)
It also produces a warning that I had read is a tf bug caused by training my own model so I don’t think is the issue.
WARNING:tensorflow:Importing a function (__inference_EfficientDet-D0_layer_call_and_return_conditional_losses_95612) with ops with custom gradients. Will likely fail if a gradient is requested.
WARNING:tensorflow:Importing a function (__inference_EfficientDet-D0_layer_call_and_return_conditional_losses_82391) with ops with custom gradients. Will likely fail if a gradient is requested.
Every time I use the
nvcr.io/nvidia/l4t-tensorflow:r32.4.3-tf2.2-py3 image I get the
Tensorflow Can't find Cuda error. It is possible I just need to mount something onto my docker file. I don’t think this is my issue because I had a previous premade tf model running just fine on the image with this error.
Sorry for the long post. Just wanted to get all the info out. :)