Currently, we have found that the same code from tensorflow document. Running on Jetson Orin Nano , the result is quite un-beleivable. We don’t know why.
We have fired an Tensorflow issue: Doc(Transfer learning and fine-tuning) is quite different from real executive result. #66696
Here is the comparison of NVIDIA 2.15.0+nv24.03 v.s. Colab v.s. Tensorflow Documentation.
I DO think much more attention should be take care of thoes warnings. So we wanna know how cross compile NVIDIA did. Are thoese warnings correct??? How to compile tensorflow for Jetson Orin Nano?
Hi,
We will try to reproduce this and update later.
Suppose the issue can be reproduced via learnopencv/Keras-Fine-Tuning-Pre-Trained-Models/Keras-Fine-Tune-Pre-Trained-Models-GTSRB.ipynb at master · spmallick/learnopencv · GitHub ? Is that correct?
Issue 1 looks like a compatible issue but if you are using our prebuilt that built on the same JetPack.
It should be compatible.
Issue 2 is harmless since NUMA is not available on Jetson.
Issue 3 is OOM which is a hardware limitation on Orin Nano.
Thanks.
No. It’s a tensorflow demo, check this Doc(Transfer learning and fine-tuning) is quite different from real executive result. #66696
And I think this is the real problem that bothers me.
Yes, all binary JetPack 6.0 and NVIDIA 2.15.0+nv24.03 is from NVIDIA.
" Unable to register cuDNN/cuFFT/ cuBLAS factory"??? I though Jetson Orin has cuDNN, it should be register to cuDNN.
OK
As you have previous mentioned that it runs out of memory when running Keras-Fine-Tune-Pre-Trained-Models-GTSRB demo.
Hi,
Sorry that the comment is not clear.
OOM is out of memory which indicates the usecase is out of the Orin Nano capacity.
This is a hardware limit.
Will let you know for our finding shortly.
Thanks.
Hi,
Thanks for your patience.
We test the transfer learning tutorial on JetPack6 GA with TensorFlow 2.15.0+nv24.04.
The training can work normally like below:
However, the prediction does look strange.
All the output label seems to be set to 1 (dog).
...
Predictions:
[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
Labels:
[1 1 0 0 1 0 0 1 0 1 0 1 0 0 1 0 1 0 1 0 0 0 0 1 0 1 1 0 0 1 0 0]
We are now checking the prediction issue.
Will keep you updated.
Thanks.
jetson_tf2.15.0_nv24.3__transfer_learning.zip (2.6 MB)
As I was running on nv24.3 build. Did you try tf2.15.0_nv24.3? Not the environment issue?
Maybe, I should upgrade to 2.15.0+nv24.04, then the env is lost.
Hope to locate the issue!
$ pip3 show tensorflow
Name: tensorflow
Version: 2.15.0+nv24.3
Summary: TensorFlow is an open source machine learning framework for everyone.
Home-page: https://www.tensorflow.org/
Author: Google Inc.
Author-email: packages@tensorflow.org
License: Apache 2.0
Location: /usr/local/lib/python3.10/dist-packages
Requires: absl-py, astunparse, flatbuffers, gast, google-pasta, grpcio, h5py, keras, libclang, ml-dtypes, numpy, opt-einsum, packaging, protobuf, setuptools, six, tensorboard, tensorflow-estimator, tensorflow-io-gcs-filesystem, termcolor, typing-extensions, wrapt
Required-by:
Hi,
You can give it a try.
Based on the link, this error comes from TensorFlow and doesn’t affect the functionality.
opened 11:21AM - 28 Sep 23 UTC
stat:awaiting tensorflower
type:build/install
comp:gpu
subtype: ubuntu/linux
TF2.14
### Issue type
Bug
### Have you reproduced the bug with TensorFlow Nightly?
N… o
### Source
binary
### TensorFlow version
2.14.0
### Custom code
No
### OS platform and distribution
Ubuntu 23.04
### Mobile device
_No response_
### Python version
3.11.5
### Bazel version
_No response_
### GCC/compiler version
_No response_
### CUDA/cuDNN version
CUDA 11.8 CUDNN 8.9.4
### GPU model and memory
Nvidia RTX 3080ti
### Current behavior?
in the shell terminal
install tensorflow via pip
```sh
pip install tensorflow==2.14.0
```
In the python terminal
input
```python
import tensorflow as tf
```
then the output
```sh
2023-09-28 19:19:50.298229: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-09-28 19:19:50.298259: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-09-28 19:19:50.298302: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-09-28 19:19:50.303578: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-28 19:19:50.982905: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
```
### Standalone code to reproduce the issue
```shell
no
```
### Relevant log output
_No response_
Thanks.
Hi,
Are you familiar with the model used in the below tutorial?
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "77gENRVX40S7"
},
"source": [
"##### Copyright 2019 The TensorFlow Authors."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "d8jyt37T42Vf"
},
"outputs": [],
"source": [
This file has been truncated. show original
When checking the prediction output, we found it only uses a single value to represent the two class classification issues.
Is this expected? Usually, we will get two confidence values and one for each class.
Thanks.
Yes, I also find links about those warnings: cuDNN, cuFFT, and cuBLAS Errors · Issue #62075 · tensorflow/tensorflow · GitHub
Those issues still open, there is no conclusion yet. Maybe experts are busy, don’t have time to fix the issue.
Tried 24.04, I have no luck here.
jetson_tf2.15.0_nv24.04_transfer_learning.zip (2.6 MB)
Installing collected packages: tensorflow
Attempting uninstall: tensorflow
Found existing installation: tensorflow 2.15.0+nv24.3
Uninstalling tensorflow-2.15.0+nv24.3:
Successfully uninstalled tensorflow-2.15.0+nv24.3
Successfully installed tensorflow-2.15.0+nv24.4
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Hi,
Which JetPack version do you use?
We tested it on JetPack6 GA and it can work normally.
Thanks.
Hi,
Would you mind upgrading the environment to JetPack 6.0 GA?
We have confirmed that it can work.
Thanks.
It’s good to know that JetPack 6.0 GA works with tf2.15.0+nv24.04. The worst condition is to upgrade JetPack 6.0 DP to 6.0 GA on Jetson Orin Nano.
But I believe that JetPack 6.0 DP should work with tf2.15.0+nv24.04, there might be some unknow configuration, which I don’t know, make thing incorrect.
Can you confirm that it will have the same incorrect result as I did, when JetPack 6.0 DP works with tf2.15.0+nv24.04? Or we have to say it will have trouble when JetPack 6.0 DP works with tf2.15.0+nv24.04?
EDIT: BTW, I can’t find JetPack 6.0GA, it might be NOT released yet.
https://developer.download.nvidia.cn/compute/redist/jp/
Hi,
Do you have any dependencies on JetPack 6 DP?
Usually, we recommend the user move to the GA release since it is a production release.
JetPack 6 GA can be found in the SDK manager.
After reflashing and installing the components, please install the tf2.15.0+nv24.04 package for testing again.
Thanks.
What’s the difference between JetPack 6 DP and JetPack 6 GA. As all the info coming from binary released version is for v60dp/v512/v511 etc. I didn’t know anything about GA (And we didn’t use SDK UI manager to install the system).
The question remains: Can you confirm that it will have the same incorrect result as I did, when JetPack 6.0 DP works with tf2.15.0+nv24.04? Or we have to say it will have trouble when JetPack 6.0 DP works with tf2.15.0+nv24.04?
Hi,
DP is a developer preview, in short, the early release for anyone interested in the new feature to try first.
JetPack 6 GA was released just weeks ago so the info is expected to be limited.
Since there is a GA version available, we won’t go back to check if there is a bug or any issue in the DP.
Instead, we recommend you try the product release and if the issue goes on, we can debug based on the stable BSP further.
Thanks.
I didn’t think so. BTW does v60 stands for GA version?
OK. In that case, I think this is a NOT recommended version. JetPack 6.0 DP and tf2.15.0+nv24.04 may have potential issues, and JetPack 6.0 DP lacks maintenance and fixes, making it unsuitable for developers to use. .
EDIT:
Hi,
JetPack is the software for Orin (OS). Not the TensorFlow package.
Please find it in the SDK manager.
Here is the SDK manager tutorial for your reference:
https://docs.nvidia.com/sdk-manager/install-with-sdkm-jetson/index.html
Thanks.
Yes. we know that.
Now, there is 404 error in v60. Do you have TensorFlow binary release for v60?
EDIT: Fresh new installed v60(Linux36.03) version + v60DP(2.15.0+nv24.04)
~$ sudo pip3 install --extra-index-url https://developer.download.nvidia.com /compute/redist/jp/v60dp tensorflow==2.15.0+nv24.04
[sudo] password for daniel:
Looking in indexes: https://pypi.org/simple, https://developer.download.nvidia.com/compute/redist /jp/v60dp
Collecting tensorflow==2.15.0+nv24.04
Downloading https://developer.download.nvidia.cn/compute/redist/jp/v60dp/tensorflow/tensorflow- 2.15.0%2Bnv24.04-cp310-cp310-linux_aarch64.whl (465.4 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 465.4/465.4 MB 3.1 MB/s eta 0:00:00
Collecting google-pasta>=0.1.1
Downloading google_pasta-0.2.0-py3-none-any.whl (57 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 57.5/57.5 KB 575.2 kB/s eta 0:00:00
Collecting keras<2.16,>=2.15.0
Downloading keras-2.15.0-py3-none-any.whl (1.7 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.7/1.7 MB 1.3 MB/s eta 0:00:00
Collecting ml-dtypes~=0.2.0
Downloading ml_dtypes-0.2.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.0 M B)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/1.0 MB 896.1 kB/s eta 0:00:00
Collecting tensorflow-io-gcs-filesystem>=0.23.1
Downloading tensorflow_io_gcs_filesystem-0.37.0-cp310-cp310-manylinux_2_17_aarch64.manylinux201 4_aarch64.whl (4.8 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.8/4.8 MB 2.0 MB/s eta 0:00:00
Collecting gast!=0.5.0,!=0.5.1,!=0.5.2,>=0.2.1
Downloading gast-0.5.4-py3-none-any.whl (19 kB)
Requirement already satisfied: packaging in /usr/lib/python3/dist-packages (from tensorflow==2.15 .0+nv24.04) (21.3)
Requirement already satisfied: six>=1.12.0 in /usr/lib/python3/dist-packages (from tensorflow==2. 15.0+nv24.04) (1.16.0)
Collecting termcolor>=1.1.0
Downloading termcolor-2.4.0-py3-none-any.whl (7.7 kB)
Collecting opt-einsum>=2.3.2
Downloading opt_einsum-3.3.0-py3-none-any.whl (65 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 65.5/65.5 KB 1.9 MB/s eta 0:00:00
Collecting numpy<2.0.0,>=1.23.5
Downloading numpy-1.26.4-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (14.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.2/14.2 MB 1.3 MB/s eta 0:00:00
Collecting absl-py>=1.0.0
Downloading absl_py-2.1.0-py3-none-any.whl (133 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 133.7/133.7 KB 1.5 MB/s eta 0:00:00
Requirement already satisfied: setuptools in /usr/lib/python3/dist-packages (from tensorflow==2.1 5.0+nv24.04) (59.6.0)
Collecting tensorflow-estimator<2.16,>=2.15.0
Downloading tensorflow_estimator-2.15.0-py2.py3-none-any.whl (441 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 442.0/442.0 KB 1.6 MB/s eta 0:00:00
Collecting astunparse>=1.6.0
Downloading astunparse-1.6.3-py2.py3-none-any.whl (12 kB)
Collecting flatbuffers>=23.5.26
Downloading flatbuffers-24.3.25-py2.py3-none-any.whl (26 kB)
Collecting typing-extensions>=3.6.6
Downloading typing_extensions-4.12.0-py3-none-any.whl (37 kB)
Collecting libclang>=13.0.0
Downloading libclang-18.1.1-py2.py3-none-manylinux2014_aarch64.whl (23.8 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 23.8/23.8 MB 1.7 MB/s eta 0:00:00
Collecting wrapt<1.15,>=1.11.0
Downloading wrapt-1.14.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (78 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.0/78.0 KB 3.6 MB/s eta 0:00:00
Collecting h5py>=2.9.0
Downloading h5py-3.11.0.tar.gz (406 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 406.5/406.5 KB 2.5 MB/s eta 0:00:00
Installing build dependencies ... done
Getting requirements to build wheel ... done
Installing backend dependencies ... done
Preparing metadata (pyproject.toml) ... done
Collecting tensorboard<2.16,>=2.15
Downloading tensorboard-2.15.2-py3-none-any.whl (5.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.5/5.5 MB 805.1 kB/s eta 0:00:00
Collecting protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.20.3
Downloading protobuf-4.25.3-cp37-abi3-manylinux2014_aarch64.whl (293 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 293.7/293.7 KB 494.4 kB/s eta 0:00:00
Collecting grpcio<2.0,>=1.24.3
Downloading grpcio-1.64.0-cp310-cp310-manylinux_2_17_aarch64.whl (5.4 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.4/5.4 MB 1.0 MB/s eta 0:00:00
Requirement already satisfied: wheel<1.0,>=0.23.0 in /usr/lib/python3/dist-packages (from astunpa rse>=1.6.0->tensorflow==2.15.0+nv24.04) (0.37.1)
Requirement already satisfied: requests<3,>=2.21.0 in /usr/lib/python3/dist-packages (from tensor board<2.16,>=2.15->tensorflow==2.15.0+nv24.04) (2.25.1)
Collecting google-auth<3,>=1.6.3
Downloading google_auth-2.29.0-py2.py3-none-any.whl (189 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 189.2/189.2 KB 1.5 MB/s eta 0:00:00
Collecting google-auth-oauthlib<2,>=0.5
Downloading google_auth_oauthlib-1.2.0-py2.py3-none-any.whl (24 kB)
Collecting tensorboard-data-server<0.8.0,>=0.7.0
Downloading tensorboard_data_server-0.7.2-py3-none-any.whl (2.4 kB)
Collecting werkzeug>=1.0.1
Downloading werkzeug-3.0.3-py3-none-any.whl (227 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 227.3/227.3 KB 1.5 MB/s eta 0:00:00
Requirement already satisfied: markdown>=2.6.8 in /usr/lib/python3/dist-packages (from tensorboar d<2.16,>=2.15->tensorflow==2.15.0+nv24.04) (3.3.6)
Collecting cachetools<6.0,>=2.0.0
Downloading cachetools-5.3.3-py3-none-any.whl (9.3 kB)
Collecting pyasn1-modules>=0.2.1
Downloading pyasn1_modules-0.4.0-py3-none-any.whl (181 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 181.2/181.2 KB 1.8 MB/s eta 0:00:00
Collecting rsa<5,>=3.1.4
Downloading rsa-4.9-py3-none-any.whl (34 kB)
Collecting requests-oauthlib>=0.7.0
Downloading requests_oauthlib-2.0.0-py2.py3-none-any.whl (24 kB)
Collecting MarkupSafe>=2.1.1
Downloading MarkupSafe-2.1.5-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (26 k B)
Collecting pyasn1<0.7.0,>=0.4.6
Downloading pyasn1-0.6.0-py2.py3-none-any.whl (85 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 85.3/85.3 KB 1.9 MB/s eta 0:00:00
Requirement already satisfied: oauthlib>=3.0.0 in /usr/lib/python3/dist-packages (from requests-o authlib>=0.7.0->google-auth-oauthlib<2,>=0.5->tensorboard<2.16,>=2.15->tensorflow==2.15.0+nv24.04 ) (3.2.0)
Building wheels for collected packages: h5py
Building wheel for h5py (pyproject.toml) ... error
error: subprocess-exited-with-error
× Building wheel for h5py (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [7 lines of output]
running bdist_wheel
running build
running build_ext
Loading library to get build settings and version: libhdf5.so
error: Unable to load dependency HDF5, make sure HDF5 is installed properly
Library dirs checked: []
error: libhdf5.so: cannot open shared object file: No such file or directory
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for h5py
Failed to build h5py
ERROR: Could not build wheels for h5py, which is required to install pyproject.toml-based project s