Inconsistency of NVIDIA 2.15.0+nv24.03 v.s. Colab v.s. Tensorflow Documentation

Currently, we have found that the same code from tensorflow document. Running on Jetson Orin Nano , the result is quite un-beleivable. We don’t know why.

We have fired an Tensorflow issue: Doc(Transfer learning and fine-tuning) is quite different from real executive result. #66696

Here is the comparison of NVIDIA 2.15.0+nv24.03 v.s. Colab v.s. Tensorflow Documentation.

I DO think much more attention should be take care of thoes warnings. So we wanna know how cross compile NVIDIA did. Are thoese warnings correct??? How to compile tensorflow for Jetson Orin Nano?

Hi,

We will try to reproduce this and update later.
Suppose the issue can be reproduced via learnopencv/Keras-Fine-Tuning-Pre-Trained-Models/Keras-Fine-Tune-Pre-Trained-Models-GTSRB.ipynb at master · spmallick/learnopencv · GitHub? Is that correct?

Issue 1 looks like a compatible issue but if you are using our prebuilt that built on the same JetPack.
It should be compatible.

Issue 2 is harmless since NUMA is not available on Jetson.

Issue 3 is OOM which is a hardware limitation on Orin Nano.

Thanks.

No. It’s a tensorflow demo, check this Doc(Transfer learning and fine-tuning) is quite different from real executive result. #66696

And I think this is the real problem that bothers me.

Yes, all binary JetPack 6.0 and NVIDIA 2.15.0+nv24.03 is from NVIDIA.

" Unable to register cuDNN/cuFFT/ cuBLAS factory"??? I though Jetson Orin has cuDNN, it should be register to cuDNN.

OK

As you have previous mentioned that it runs out of memory when running Keras-Fine-Tune-Pre-Trained-Models-GTSRB demo.

Hi,

Sorry that the comment is not clear.

OOM is out of memory which indicates the usecase is out of the Orin Nano capacity.
This is a hardware limit.

Will let you know for our finding shortly.
Thanks.

Hi,

Thanks for your patience.
We test the transfer learning tutorial on JetPack6 GA with TensorFlow 2.15.0+nv24.04.

The training can work normally like below:

However, the prediction does look strange.
All the output label seems to be set to 1 (dog).

...
Predictions:
 [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
Labels:
 [1 1 0 0 1 0 0 1 0 1 0 1 0 0 1 0 1 0 1 0 0 0 0 1 0 1 1 0 0 1 0 0]

We are now checking the prediction issue.
Will keep you updated.

Thanks.

jetson_tf2.15.0_nv24.3__transfer_learning.zip (2.6 MB)

As I was running on nv24.3 build. Did you try tf2.15.0_nv24.3? Not the environment issue?

Maybe, I should upgrade to 2.15.0+nv24.04, then the env is lost.

Hope to locate the issue!

$ pip3 show tensorflow
Name: tensorflow
Version: 2.15.0+nv24.3
Summary: TensorFlow is an open source machine learning framework for everyone.
Home-page: https://www.tensorflow.org/
Author: Google Inc.
Author-email: packages@tensorflow.org
License: Apache 2.0
Location: /usr/local/lib/python3.10/dist-packages
Requires: absl-py, astunparse, flatbuffers, gast, google-pasta, grpcio, h5py, keras, libclang, ml-dtypes, numpy, opt-einsum, packaging, protobuf, setuptools, six, tensorboard, tensorflow-estimator, tensorflow-io-gcs-filesystem, termcolor, typing-extensions, wrapt
Required-by:

Hi,

You can give it a try.
Based on the link, this error comes from TensorFlow and doesn’t affect the functionality.

Thanks.

Hi,

Are you familiar with the model used in the below tutorial?

When checking the prediction output, we found it only uses a single value to represent the two class classification issues.
Is this expected? Usually, we will get two confidence values and one for each class.

Thanks.

Yes, I also find links about those warnings: cuDNN, cuFFT, and cuBLAS Errors · Issue #62075 · tensorflow/tensorflow · GitHub
Those issues still open, there is no conclusion yet. Maybe experts are busy, don’t have time to fix the issue.

Tried 24.04, I have no luck here.

jetson_tf2.15.0_nv24.04_transfer_learning.zip (2.6 MB)

Installing collected packages: tensorflow
  Attempting uninstall: tensorflow
    Found existing installation: tensorflow 2.15.0+nv24.3
    Uninstalling tensorflow-2.15.0+nv24.3:
      Successfully uninstalled tensorflow-2.15.0+nv24.3
Successfully installed tensorflow-2.15.0+nv24.4
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a                                   virtual environment instead: https://pip.pypa.io/warnings/venv

Hi,

Which JetPack version do you use?
We tested it on JetPack6 GA and it can work normally.

Thanks.

Jetpack 6.0 DP

Hi,

Would you mind upgrading the environment to JetPack 6.0 GA?
We have confirmed that it can work.

Thanks.

It’s good to know that JetPack 6.0 GA works with tf2.15.0+nv24.04. The worst condition is to upgrade JetPack 6.0 DP to 6.0 GA on Jetson Orin Nano.

But I believe that JetPack 6.0 DP should work with tf2.15.0+nv24.04, there might be some unknow configuration, which I don’t know, make thing incorrect.

Can you confirm that it will have the same incorrect result as I did, when JetPack 6.0 DP works with tf2.15.0+nv24.04? Or we have to say it will have trouble when JetPack 6.0 DP works with tf2.15.0+nv24.04?

EDIT: BTW, I can’t find JetPack 6.0GA, it might be NOT released yet.

https://developer.download.nvidia.cn/compute/redist/jp/

Hi,

Do you have any dependencies on JetPack 6 DP?
Usually, we recommend the user move to the GA release since it is a production release.

JetPack 6 GA can be found in the SDK manager.
After reflashing and installing the components, please install the tf2.15.0+nv24.04 package for testing again.

Thanks.

What’s the difference between JetPack 6 DP and JetPack 6 GA. As all the info coming from binary released version is for v60dp/v512/v511 etc. I didn’t know anything about GA (And we didn’t use SDK UI manager to install the system).

The question remains: Can you confirm that it will have the same incorrect result as I did, when JetPack 6.0 DP works with tf2.15.0+nv24.04? Or we have to say it will have trouble when JetPack 6.0 DP works with tf2.15.0+nv24.04?

Hi,

DP is a developer preview, in short, the early release for anyone interested in the new feature to try first.
JetPack 6 GA was released just weeks ago so the info is expected to be limited.

Since there is a GA version available, we won’t go back to check if there is a bug or any issue in the DP.
Instead, we recommend you try the product release and if the issue goes on, we can debug based on the stable BSP further.

Thanks.

I didn’t think so. BTW does v60 stands for GA version?

OK. In that case, I think this is a NOT recommended version. JetPack 6.0 DP and tf2.15.0+nv24.04 may have potential issues, and JetPack 6.0 DP lacks maintenance and fixes, making it unsuitable for developers to use..