This old post has been resolved with the updated Whitepaper, confirming fp32 accumulate is half-rate on geforce, but does cublas (cublasLtMatmul()) support fp8 with fp16 accumulate now? The fp8 example breaks when I change CUBLAS_COMPUTE_32F to CUBLAS_COMPUTE_16F (Line 71), returning “cuBLAS API failed with status 15”, so I assume it’s not supported, but I want to make sure.
From cublasLtMatmul()
To use FP8 kernels, the following set of requirements must be satisfied:
- All matrix pointers must be 16-byte aligned.
- A must be transposed and B non-transposed (The “TN” format).
- The compute type must be CUBLAS_COMPUTE_32F.
- The scale type must be CUDA_R_32F.
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.