What is the meaning of the two metrics.
I’m profiling a kernel on V100 which has a bandwidth of 900GB/s, but nvprof gld_throughput gets about 3x as much as the theoretical peak bandwidth, and also, kernels can get gld_efficiency larger than 100%. why is that?