Hi ,
I have been running the CUDALibrarySamples/cuBLASLt/LtFp8matmul
example on my RTX 4000 Ada laptop and RTX 4090, and I observed that the achieved TFLOPS are identical on both GPUs. I am interested in further testing the FP8 TFLOPS performance to understand the capabilities and differences between these GPUs more comprehensively.
Could anyone point me to or share any specific benchmark programs or tools designed to evaluate FP8 performance on NVIDIA GPUs? Any guidance or resources would be greatly appreciated!
Thank you in advance for your help!