Tensorflow Container for use on H200?

scousins · May 13, 2026, 1:18am

Description

I’m trying to use the Tensorflow container from NGC with H200 GPUs. It works but prints out messages

'+ptx85' is not a recognized feature for this target (ignoring feature)

The container I’m using: tensorflow:25.02-tf2-py3 is over a year old. It uses CUDA 12 but my drivers are 13.

How can I get a working container that will work with CUDA 13 and the H200 without the +ptx85 messages and slowness due to JIT?

Thanks,

Environment

TensorRT Version:
GPU Type: H200
Nvidia Driver Version: 13.2
CUDA Version: 13.2
CUDNN Version:
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag): Container: tensorflow:25.02-tf2-py3

Relevant Files

Steps To Reproduce

Please include:

Exact steps/commands to build your repro
Exact steps/commands to run your repro
Full traceback of errors encountered

athkumar · May 30, 2026, 5:56pm

Hi @scousins, thanks for posting and for narrowing it down to the JIT path - that’s the right read.

The '+ptx85' is not a recognized feature for this target line is coming from the LLVM/PTX assembler bundled inside tensorflow:25.02-tf2-py3, which was built against the CUDA 12.x toolchain. Your host driver is forward-compatible enough to launch the kernels, but XLA is JIT-recompiling them at runtime through the container’s older PTX assembler, which doesn’t know about the newer feature flags the kernels are tagged with. That’s where the slowdown is coming from, not the driver mismatch itself.

Two things to do:

Pull a newer TensorFlow container. The 25.02 tag is over a year old at this point. Newer tags on the NGC TensorFlow page are built against CUDA 13 toolchains and ship a matching PTX assembler, which makes the warning go away and lets H200 (sm_90) kernels run as AOT-compiled rather than JIT.
Cross-check against the NVIDIA Frameworks Container Support Matrix before you upgrade, so you pick a tag whose declared driver/CUDA range covers your 13.2 driver. The matrix lists the bundled CUDA, cuDNN, and TRT versions per release tag.

A small heads-up: TensorRT itself only enters the picture here if you’re using TF-TRT (tf.experimental.tensorrt.Converter). For pure TF + XLA on H200, the slowdown will fully resolve once the container itself moves to a CUDA 13 base. Happy to keep going if anything in the TF-TRT path breaks after the upgrade.

Thanks, Atharva

scousins · June 1, 2026, 2:58pm

Thanks very much @Athkumar !

In bullet point 1, you recommend pulling a new Tensorflow container. This really was the core of my question: The latest one seems to be 25.02. I don’t see anything newer. Can you point me to the URL of a newer container that was built with CUDA 13? You mention the “NGC Tensorflow page”. What is the URL for that?

Thanks!

Steve

athkumar · June 1, 2026, 7:42pm

Hi @scousins, you’re right, and I owe you a correction on my first bullet. There isn’t a newer TensorFlow container to point you at: 25.02 is the final one, and my “pull a newer tag” note was wrong for TensorFlow specifically. Apologies for sending you hunting for something that doesn’t exist.

Here’s what actually happened. NVIDIA stopped publishing the Optimized TensorFlow containers after 25.02. It’s in the 25.02 release notes:

After the 25.02 release, NVIDIA Optimized TensorFlow containers will no longer be released.

NVIDIA staff have said the same on the forums, on the “ptxas too old” report for this exact image: Compile error with ptxas too old on latest version(nvcr.io/nvidia/tensorflow:25.02-tf2-py3)

So 25.02 (which is CUDA 12.8, not 13) is the end of the line for TF. The rest of the NGC framework line did move to CUDA 13 and is on the 26.x tags now, which is where my mix-up came from. TensorFlow just didn’t make that jump.

The URLs you asked for:

NGC TensorFlow catalog page (you’ll see 25.02 is the newest tag): TensorFlow | NVIDIA NGC
Frameworks Support Matrix (CUDA / cuDNN / TRT per tag): Frameworks Support Matrix - NVIDIA Docs

Good news for your setup: H200 is Hopper (sm_90), which CUDA 12.8 supports fully, so 25.02 is the right and final container for your hardware. The hard “ptxas too old” failure in that thread is on DGX Spark (GB10 / sm_121), a newer architecture than your Hopper card, so it doesn’t apply to you. What you have is the milder case: the +ptx85 line is the container’s older toolchain ignoring a newer PTX feature flag, which on sm_90 isn’t a correctness problem by itself.

Realistic options:

1. Stay on 25.02 (lowest effort, and correct for H200)

It’s the final, validated TF image and it fully supports sm_90; your CUDA 13.2 driver runs it via forward compatibility. Before you treat the +ptx85 warning as a blocker, confirm it’s actually costing you step time. Some XLA JIT compilation at startup is normal on any container version, so the warning by itself may just be log noise.

2. Build your own CUDA 13 + TF image (only if you specifically need the CUDA 13 toolchain)

No NVIDIA-optimized path here, but you can layer the upstream tensorflow pip wheels (or build from source) on an nvcr.io/nvidia/cuda:13.x base. You give up the NGC tuning and own the XLA/cuDNN compatibility yourself.

3. Switch frameworks for an NVIDIA-optimized CUDA 13 image

Only if your workload isn’t TF-locked: the PyTorch NGC container, for example, is still on the monthly cadence and on CUDA 13 (the 26.x tags).

If you can share whether you’re seeing an actual step-time regression (vs just the warning in the log), I can help you decide whether option 1 is good enough or whether you actually need the rebuild in option 2.

Thanks,
Atharva

scousins · June 2, 2026, 7:48pm

Thanks @athkumar . I believe you are correct that it isn’t affecting performance. It is likely just log noise, although it is quite a bit of noise in our case. Do you know of a way to silence those messages?

I have talked it over with the instructor for the class that this is being set up for and he has decided to transition it to PyTorch. I have tried the latest NGC PyTorch container and it is working just fine.

Thanks very much for your help with this.

Topic		Replies	Views
where are the archived containers? Frameworks (archived) tensorflow	2	893	April 10, 2019
slower performance in container when using V100 Frameworks (archived) tensorflow	2	1488	June 15, 2018
all CUDA-capable devices are busy or unavailable. What is wrong? cuDNN	10	10106	October 12, 2021
TensorRT8, Tensorflow1.15.5 and Cuda 10.2 TensorRT	3	1226	June 3, 2021
V100 GPUs not recognised within the container Container: CUDA cuda	0	892	December 7, 2022
Tensorflow1.14 is not working on RTX3090 inside the Docker container of Ubuntu18.04 and CUDA10.0 with Python2 CUDA Programming and Performance cuda , tensorflow , ubuntu , docker	11	5686	April 2, 2022
Container's for Tensorflow and pytorch JP 6.0 R36.3.0 Container: CUDA containers , jetson	3	162	November 14, 2024
Nvcr.io/nvidia/tensorrt:25.12-py3-igpu container is not running on Cuda 13 TensorRT cudnn	2	126	January 29, 2026
Freeze while executing Tensorflow in a Docker container on the TX2 Jetson TX2	15	4695	October 18, 2021
Container 21.07-tf2-py3 has tensorflow 2.6 nightly instead of 2.5 TensorRT	1	409	August 21, 2021