FP8 Benchmark Program for RTX 4090

Hi ,

I have been running the CUDALibrarySamples/cuBLASLt/LtFp8matmul example on my RTX 4000 Ada laptop and RTX 4090, and I observed that the achieved TFLOPS are identical on both GPUs. I am interested in further testing the FP8 TFLOPS performance to understand the capabilities and differences between these GPUs more comprehensively.

Could anyone point me to or share any specific benchmark programs or tools designed to evaluate FP8 performance on NVIDIA GPUs? Any guidance or resources would be greatly appreciated!

Thank you in advance for your help!