How did you build the provided tensorflow 2.16.1 on the download site?

There is a pre-compiled tensorflow 2.16.1 available at:

https://developer.download.nvidia.com/compute/redist/jp/v60/tensorflow/

This was compiled for Python 3.10, and it works perfectly on my Jetson Orin Developer Kit.

As I need Python 3.11, I am compiling tensorflow from source. I am using a venv environment. Both tensorflow 2.15.1 and tensorflow 2.16.1 compile successfully into installable .whl files. But when running a simple MNIST classificator using Keras, tensorflow 2.15.1 works, while tensorflow 2.16.1 fails:

$ ./mnist-keras.py
2.16.1
2024-06-29 02:03:30.515968: F ./tensorflow/core/kernels/random_op_gpu.h:247] Non-OK-status: GpuLaunchKernel(FillPhiloxRandomKernelLaunch, num_blocks, block_size, 0, d.stream(), key, counter, gen, data, size, dist) status: INTERNAL: no kernel image is available for execution on the device
Aborted

Running the same script using Python3.10 and the pre-compiled TF 2.16.1 works.

Now as compiling from source works principally, as I have demonstrated with compiling TF 2.15.1 for Python 3.11, I’d like to ask if I need to provide some special compile flags or patches to get a working TF 2.16.1 for Python3.11 on the Jetson Orin? (I have installed the latest OS with Jetpack 6)

Sorry for the late response.
Is this still an issue to support? Any result can be shared?

Hi,

INTERNAL: no kernel image is available for execution on the device

Usually, this error is caused by incorrect GPU architecture.
Have you compiled the TensorFlow with Orin GPU’s architecture, which is 8.7?

Thanks.

Yes, this was the first thing that I re-checked. I definitely give ‘8.7’ as compute capability when asked for this when executing ./configure

I basically build TF 2.15 and TF 2.16 in the exact same way, but only TF 2.15 works, TF 2.16 throws above error. I follow the guide at the Tensorflow webpage here: Créer à partir de la source  |  TensorFlow

Only difference is the bazel and clang version for 2.15 and 2.16, I use those recommended on the Tensorflow build from source page.

So my question is if you could share how exactly you build Tensorflow for the provided pre-compiled binary packages, so I can check if I need to use some additional compile flags or patches or anything else to get 2.16 to work.

Is this still an issue to support? Any result can be shared?

Unfortunately yes, still an issue. No success so far compiling TF 2.16 for Python 3.11. In the mean time I use the precompiled TF 2.16 for Python 3.10.

Update: Tensorflow 2.17.0 was released last week, and I compiled this version today on the Jetson Orin.

This version works!

I0000 00:00:1721246807.523362   81563 service.cc:146] XLA service 0xfffe30008fb0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1721246807.523537   81563 service.cc:154]   StreamExecutor device (0): Orin, Compute Capability 8.7
I0000 00:00:1721246812.449436   81563 device_compiler.h:188] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
938/938 ━━━━━━━━━━━━━━━━━━━━ 20s 14ms/step - accuracy: 0.7929 - loss: 0.6396 - val_accuracy: 0.9694 - val_loss: 0.0984
Epoch 2/5
938/938 ━━━━━━━━━━━━━━━━━━━━ 7s 7ms/step - accuracy: 0.9719 - loss: 0.0848 - val_accuracy: 0.9833 - val_loss: 0.0545
Epoch 3/5
938/938 ━━━━━━━━━━━━━━━━━━━━ 6s 7ms/step - accuracy: 0.9821 - loss: 0.0583 - val_accuracy: 0.9832 - val_loss: 0.0522
Epoch 4/5
938/938 ━━━━━━━━━━━━━━━━━━━━ 6s 7ms/step - accuracy: 0.9877 - loss: 0.0412 - val_accuracy: 0.9837 - val_loss: 0.0469
Epoch 5/5
938/938 ━━━━━━━━━━━━━━━━━━━━ 6s 7ms/step - accuracy: 0.9886 - loss: 0.0370 - val_accuracy: 0.9878 - val_loss: 0.0381

No clue what prevented Tensorflow 2.16 from starting kernels on the GPU, but TF 2.17 seems to have solved it.

I think the issue is therefore closed.

Hi,

Thanks a lot for the update.
This will help other users who also want to build it from the source.

Thanks.

Additional information: I re-compiled TF 2.17 to test if my workflow is reproducible, and again got the “no kernel” error during execution.

I managed to get a working TF 2.17 now by explicitly giving the compute capability via an additional environment variable at the bazel build command.

The bazel build that finally worked on my Jetson Orin looks like this, should be useful to people who want to compile it by themselves:

bazel build //tensorflow/tools/pip_package:wheel --repo_env=WHEEL_NAME=tensorflow --repo_env=TF_CUDA_COMPUTE_CAPABILITIES=compute_87 --config=cuda --config=opt --config=nogcp --config=nonccl --local_cpu_resources=10

I used bazel 6.5.0 and clang 17.0.6.

I unpacked clang 17.0.6 to a separate directory at /opt/clang/, and gave the path /opt/clang/clang+llvm-17.0.6-aarch64-linux-gnu/bin/clang during ./configure when asked for the path to the clang compiler.

So the workflow is (I hope I didn’t forget anything):

a) Get clang 17.0.6 and unpack it to /opt/clang/
b) Get bazel 6.5.0 for aarch64 and put it to /usr/local/bin/bazel
c) Get tensorflow sources, switch to tag branch v2.17.0
d) export TF_CUDA_COMPUTE_CAPABILITIES=compute_87
e) Run ./configure
f) Specify No to ROCm, Yes to CUDA, Yes to TensorRT, 8.7 to compute capability if asked, Yes to clang, /opt/clang/clang+llvm-17.0.6-aarch64-linux-gnu/bin/clang as clang path, -Wno-everything as compile flags, No to Android builds.
g) Run above bazel build
h) Compilation takes around 6 hours
i) The compiled wheel file is found at bazel-bin/tensorflow/tools/pip_package/wheel_house/

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.