Wrong result of gld_throughput using nvprof

I’m using nvprof to test the memory throughput of my program.

Here is the result:

Invocations                               Metric Name                        Metric Description         Min         Max         Avg
Device "Tesla V100-SXM2-32GB (0)"
    Kernel: void matmul_kernel<int>(int*, int*, int*, int, int, int)
          1                            gld_throughput                    Global Load Throughput  2896.8GB/s  2896.8GB/s  2896.8GB/s
          1                            gst_throughput                   Global Store Throughput  2.6625GB/s  2.6625GB/s  2.6625GB/s
          1                            gld_efficiency             Global Memory Load Efficiency      13.24%      13.24%      13.24%
          1                            gst_efficiency            Global Memory Store Efficiency      25.00%      25.00%      25.00%
          1                        achieved_occupancy                        Achieved Occupancy    0.958892    0.958892    0.958892

The throughput is 2896.8GB/s far larger than the 900GB/s peek performance. why is that?