Is there any official benchmark tool to test a GPU's FLOPS?

You won’t be able to achieve that performance on L4. There are a few reasons for this.

  1. No GPU delivers peak theoretical throughput.
  2. The L4 has a power limitation (~70W) that constrains its ability to approach peak theoretical. All GPUs have this phenomenon more or less (they tend to go into a power-limiting state when performing large/continuous/repeated GEMM type operations). You can confirm this using nvidia-smi -a while the test is running and looking at the “clocks throttle reasons” section. There are many reports like this, here is one for T4. Here is another recent report on A10.