Fp8/fp16 accumulation on ada RTX 4090

mustafak6483 · May 29, 2024, 4:13am

This old post has been resolved with the updated Whitepaper, confirming fp32 accumulate is half-rate on geforce, but does cublas (cublasLtMatmul()) support fp8 with fp16 accumulate now? The fp8 example breaks when I change CUBLAS_COMPUTE_32F to CUBLAS_COMPUTE_16F (Line 71), returning “cuBLAS API failed with status 15”, so I assume it’s not supported, but I want to make sure.

mnicely · June 5, 2024, 4:28pm

From cublasLtMatmul()

To use FP8 kernels, the following set of requirements must be satisfied:

All matrix pointers must be 16-byte aligned.

A must be transposed and B non-transposed (The “TN” format).

The compute type must be CUBLAS_COMPUTE_32F.

The scale type must be CUDA_R_32F.

system · June 19, 2024, 4:29pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Ada GeForce (RTX 4090) FP8 cuBLASLt performance GPU-Accelerated Libraries cublas	7	12134	November 2, 2023
Multiplication in Half and Accumulation in Single CUDA NVCC Compiler	0	528	July 3, 2022
FP8 Benchmark Program for RTX 4090 GPU-Accelerated Libraries cublas	0	648	June 17, 2024
Fp16 or fp32 Accumulate? CUDA Developer Tools	0	752	April 12, 2020
Cublaslt fp8 SASS instruction QMMA CUDA Programming and Performance	2	1185	July 3, 2023
Why is TN format required for FP8 in cublasLtMatmul()? GPU-Accelerated Libraries cublas	0	213	May 11, 2024
FP8 WMMA kernel compilation error GPU-Accelerated Libraries cublas	9	1832	March 26, 2023
cublasLT FP8 GPU-Accelerated Libraries cublas	1	1115	May 27, 2024
Does Kepler Tesla K40m support fp16? CUDA Programming and Performance	1	1116	December 23, 2015
What is the compute type in cublasGemmEx when compute with alpha, beta? GPU-Accelerated Libraries cublas	2	940	February 28, 2023

Fp8/fp16 accumulation on ada RTX 4090

Related topics