cublasLT FP8

cmos.matrix · June 7, 2023, 12:14am

I am running LtFp8Matmul example of cublasLT

The example is using e4m3 fp8 format. However, when i change it to e5m2, i am hitting this error
cuBLAS API failed with status 15
terminate called after throwing an instance of ‘std::logic_error’
** what(): cuBLAS API failed**
Aborted (core dumped)

I just modified the datatype and replaces all __nv_fp8_e4m3 with __nv_fp8_e5m2, and
inside cublasLtMatrixLayoutCreate CUDA_R_8F_E4M3 with CUDA_R_8F_E5M2
I am not sure why e5m2 is not working at all. In theory, the flow should be similar to e4m3.
Any inputs here? Thanks!

mustafak6483 · May 27, 2024, 7:44pm

I don’t think A * B where both A and B are e5m2 is supported, as it’s not on the chart: cuBLAS

You want weights and activations in e4m3, and gradients in e5m2:
https://www.reddit.com/r/MachineLearning/comments/7bi5yd/comment/dpk2ldm/

Topic		Replies	Views
Int8 ouptut bug in CUBLAS-LT? GPU-Accelerated Libraries mixed-precision	0	784	January 14, 2021
Where can I find working examples for the new cuBLASLt library? GPU-Accelerated Libraries	35	6585	March 16, 2020
cublasGemmEx doesn't work with INT8 utilizing __dp4a instruction on NVIDIA 1080TI CUDA Programming and Performance	12	3829	September 25, 2017
cuBlasLt Example GPU-Accelerated Libraries	2	1089	April 19, 2019
Batch Matrix Multiplication using CuBLAS GPU-Accelerated Libraries tensorrt , cuda , kernel , c-plus-plus	17	4034	March 2, 2021
Fp8/fp16 accumulation on ada RTX 4090 GPU-Accelerated Libraries cuda , cublas	2	1743	June 5, 2024
Cublas Bug GPU-Accelerated Libraries cublas	8	2172	June 21, 2022
cublasGemmEx cant use CUDA_R_8I compute type on GTX1080 GPU-Accelerated Libraries	4	1444	February 12, 2018
Ada GeForce (RTX 4090) FP8 cuBLASLt performance GPU-Accelerated Libraries cublas	7	14004	November 2, 2023
BUG : Can't use cuSparseLt GEMM for matrices with more than 2**31 elems using FP8 GPU-Accelerated Libraries cuda , cusparse	2	53	February 10, 2026

cublasLT FP8

Related topics