For NVIDIA A100 PCIE 40GB GPU: (I read the nvidia-ampere-architecture-whitepaper, for GA100)
I have using Nsight compute to get the Roofline and FLOP/s of a CUDA kernel. The Peak FP64 Performance (FLOP/s) = 5.27TFLOP/s
**Peak FP64 Performance in white paper = 9.7TFLOPS ** (This is obtained under GPU boost clock, I found the boost clock of A100=1410MHz)
The Peak FP64 Performance (FLOP/s) (Not app /algorithm achieved )in Nsight compute is how to get ?
Thank you a lot. Now the FLOP/s in Nsight is almost same with White paper said.
I have found that NVIDIA A100’s GPU Boost Clock=1410 MHz (ampere-architecture-white paper: page-36), Boost Clock can be get by this:
cudaDeviceProp prop;
CHECK_ERROR(cudaGetDeviceProperties(&prop, 1));
clock_t clock_rate = prop.clockRate; // Clock frequency in kilohertz
How to get the base clock frequency of A100 GPU or others ?