I maintain commercial HPC software that uses Nvidia GPUs with CUDA/CUBLAS/CUSOLVER. Is there an easy way to determine the peak double precision FLOPS using information readily from a call to cudaGetDeviceProperties? I am finding that a lot of the helpful elements of the cudaDeviceProp struct are deprecated and I would like my determination to be “future-proof”.
So for example, I would like the software to print out a message like, “You are using an A100 GPU which has peak FP64 of 9.7 TFLOPS”.
You would need to keep a database/lookup table/dictionary of some sort. I don’t know if that fits your definition of “easy”. I don’t think its possible to future proof it if that means “works automatically and correctly on future architectures with no code or database updates”. To my knowledge, even the number of CUDA cores per SM (not directly related to your question, but tangential) cannot be retrieved from cudaGetDeviceProperties without a lookup table.
Since you mention A100, the peak FP64 would actually be 19.5TF if the operation you care about in CUBLAS (or possibly CUSOLVER) uses or depends on FP64 GEMM.
Thanks for the reply, Robert. I was thinking that a lookup table might be where I was headed. I suppose I could look at cudaDeviceProp.name and compare the result with the table?
I only have access to a handful of HPC GPUs, so I don’t know how I might generate the contents of a lookup table to account for, say, RTX 6000 ADA, which is not a great GPU for FP64 performance. Is there a list of the “names” of all Nvidia GPUs somewhere out there?
Nsight Compute does not have a database (yet). The roofline maximum is based upon measure clock frequency. The NCU metrics section files can show you what metrics provide the per cycle rate but the peak clock rate for each type of instruction is not know. On more modern chips there are sometimes different peak clock rates for tensor, fp64, and other instructions making calculating maximum theoretical operations/second very difficult.