Tensorflow Container for use on H200?

Description

I’m trying to use the Tensorflow container from NGC with H200 GPUs. It works but prints out messages

'+ptx85' is not a recognized feature for this target (ignoring feature)

The container I’m using: tensorflow:25.02-tf2-py3 is over a year old. It uses CUDA 12 but my drivers are 13.

How can I get a working container that will work with CUDA 13 and the H200 without the +ptx85 messages and slowness due to JIT?

Thanks,

Environment

TensorRT Version:
GPU Type: H200
Nvidia Driver Version: 13.2
CUDA Version: 13.2
CUDNN Version:
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag): Container: tensorflow:25.02-tf2-py3

Relevant Files

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

Hi @scousins, thanks for posting and for narrowing it down to the JIT path - that’s the right read.

The '+ptx85' is not a recognized feature for this target line is coming from the LLVM/PTX assembler bundled inside tensorflow:25.02-tf2-py3, which was built against the CUDA 12.x toolchain. Your host driver is forward-compatible enough to launch the kernels, but XLA is JIT-recompiling them at runtime through the container’s older PTX assembler, which doesn’t know about the newer feature flags the kernels are tagged with. That’s where the slowdown is coming from, not the driver mismatch itself.

Two things to do:

  1. Pull a newer TensorFlow container. The 25.02 tag is over a year old at this point. Newer tags on the NGC TensorFlow page are built against CUDA 13 toolchains and ship a matching PTX assembler, which makes the warning go away and lets H200 (sm_90) kernels run as AOT-compiled rather than JIT.

  2. Cross-check against the NVIDIA Frameworks Container Support Matrix before you upgrade, so you pick a tag whose declared driver/CUDA range covers your 13.2 driver. The matrix lists the bundled CUDA, cuDNN, and TRT versions per release tag.

A small heads-up: TensorRT itself only enters the picture here if you’re using TF-TRT (tf.experimental.tensorrt.Converter). For pure TF + XLA on H200, the slowdown will fully resolve once the container itself moves to a CUDA 13 base. Happy to keep going if anything in the TF-TRT path breaks after the upgrade.

Thanks, Atharva

Thanks very much @Athkumar !

In bullet point 1, you recommend pulling a new Tensorflow container. This really was the core of my question: The latest one seems to be 25.02. I don’t see anything newer. Can you point me to the URL of a newer container that was built with CUDA 13? You mention the “NGC Tensorflow page”. What is the URL for that?

Thanks!

Steve

Hi @scousins, you’re right, and I owe you a correction on my first bullet. There isn’t a newer TensorFlow container to point you at: 25.02 is the final one, and my “pull a newer tag” note was wrong for TensorFlow specifically. Apologies for sending you hunting for something that doesn’t exist.

Here’s what actually happened. NVIDIA stopped publishing the Optimized TensorFlow containers after 25.02. It’s in the 25.02 release notes:

After the 25.02 release, NVIDIA Optimized TensorFlow containers will no longer be released.

NVIDIA staff have said the same on the forums, on the “ptxas too old” report for this exact image: Compile error with ptxas too old on latest version(nvcr.io/nvidia/tensorflow:25.02-tf2-py3)

So 25.02 (which is CUDA 12.8, not 13) is the end of the line for TF. The rest of the NGC framework line did move to CUDA 13 and is on the 26.x tags now, which is where my mix-up came from. TensorFlow just didn’t make that jump.

The URLs you asked for:

Good news for your setup: H200 is Hopper (sm_90), which CUDA 12.8 supports fully, so 25.02 is the right and final container for your hardware. The hard “ptxas too old” failure in that thread is on DGX Spark (GB10 / sm_121), a newer architecture than your Hopper card, so it doesn’t apply to you. What you have is the milder case: the +ptx85 line is the container’s older toolchain ignoring a newer PTX feature flag, which on sm_90 isn’t a correctness problem by itself.

Realistic options:

1. Stay on 25.02 (lowest effort, and correct for H200)

It’s the final, validated TF image and it fully supports sm_90; your CUDA 13.2 driver runs it via forward compatibility. Before you treat the +ptx85 warning as a blocker, confirm it’s actually costing you step time. Some XLA JIT compilation at startup is normal on any container version, so the warning by itself may just be log noise.

2. Build your own CUDA 13 + TF image (only if you specifically need the CUDA 13 toolchain)

No NVIDIA-optimized path here, but you can layer the upstream tensorflow pip wheels (or build from source) on an nvcr.io/nvidia/cuda:13.x base. You give up the NGC tuning and own the XLA/cuDNN compatibility yourself.

3. Switch frameworks for an NVIDIA-optimized CUDA 13 image

Only if your workload isn’t TF-locked: the PyTorch NGC container, for example, is still on the monthly cadence and on CUDA 13 (the 26.x tags).

If you can share whether you’re seeing an actual step-time regression (vs just the warning in the log), I can help you decide whether option 1 is good enough or whether you actually need the rebuild in option 2.

Thanks,
Atharva

Thanks @athkumar . I believe you are correct that it isn’t affecting performance. It is likely just log noise, although it is quite a bit of noise in our case. Do you know of a way to silence those messages?

I have talked it over with the instructor for the class that this is being set up for and he has decided to transition it to PyTorch. I have tried the latest NGC PyTorch container and it is working just fine.

Thanks very much for your help with this.