Support for the TF32 type and the mixed precision

LowLevelKB · November 7, 2025, 10:01pm

Dear OptiX Development Team,

I am writing to inquire about the planned support for the TF32 (TensorFloat32) data type within the next release of OptiX in the neural rendering routines (specifically, optixCoopVecMatMul and related functions). Currently, I observe that despite the availability of TF32 hardware support (Tensor Cores) and its ideal characteristics for deep learning - offering FP32 range with FP16-equivalent precision - the following input/output combinations are not natively supported in OptiX:

inputType	inputInterpretation	matrixElementType	biasElementType	outputType
FP32	TF32	FP32	FP32	FP32

The current reliance primarily on FLOAT16 is often insufficient for differential neural rendering (DNR) and inverse problems, where the narrow dynamic range of FP16 frequently leads to gradient underflow or unstable numerical results.

Are there plans to introduce native TF32 support for the input vector, bias, and matrix elements in the next release of OptiX? This feature would significantly enhance the numerical stability and performance of high-fidelity neural rendering applications.

Thank you for your consideration.

dhart · November 10, 2025, 7:22pm

Hi @LowLevelKB,

That’s a good question. I can’t comment on future support, but I can tell you that we will take your comment as a vote in favor of TF32 support in the cooperative vectors API, and I can tell you that the very next release of OptiX will not have TF32 support yet (which has no bearing on whether we support TF32 in the future).

Bear in mind that cooperative vectors APIs are cross platform, and are undergoing standards review, which might mean that it could take some time before we see support for new formats or types.

–
David.

LowLevelKB · November 11, 2025, 8:59am

Thank you for your answer. I will be looking forward. By the way, for now as the temporary workaround one can always split the rendering into the sequence of multiple optixLaunch calls (to obtain the new batch of the sorted hit primitives in the hit buffer) followed by the separate pure cuda kernels invocations responsible for “shading” where nvcuda::wmma is utilized in order to compute the output of the consecutive neural network layers loaded into the shared memory.

Topic		Replies	Views
[cuBLASDx] TF32 support? GPU-Accelerated Libraries cublas	0	231	May 7, 2024
Cudnn TF32 performs no better than FP32 on RTX3090 TensorRT	1	754	January 15, 2021
Accelerating TensorFlow on NVIDIA A100 GPUs Technical Blog	0	559	August 25, 2020
Does Tensor Core on Jetson AGX Orin support FP32( IEEE 754 single precision floating point number)? Jetson AGX Orin tensorrt , kb	5	1931	April 25, 2023
Neural Rendering in NVIDIA OptiX Using Cooperative Vectors Technical Blog	1	92	April 17, 2025
Accelerating AI Training with NVIDIA TF32 Tensor Cores Technical Blog	1	609	January 29, 2021
Optix Neural Primitives Rendering with Cooperative Vectors and SER OptiX neural-network-framework , optix	11	128	February 20, 2026
Cudnn TF32 performs no better than FP32 on RTX3090 cuDNN cudnn	5	2637	January 28, 2021
Performace on A100SXM40GB TF32 vs FP32 CUDA Programming and Performance cuda , ampere	1	1105	January 26, 2023
TF32 operations list for 30 series cards CUDA Developer Tools	0	426	February 7, 2021

Support for the TF32 type and the mixed precision

Related topics