I wonder if there are any benchmarks that can be used to measure peak GFLOPS and peak HBM bandwidth on a GPU? I know how to calculate them theoretically.
bandwidthTest provides a reasonable estimate of peak bandwidth. You can write your own copy or load or store kernel to do more careful testing.
matrix multiply is usually what will come closest to peak flops, typically via a cublas call
bandwidthTest measures the memory bandwidth of host-to-device or device-to-device. This is not what I am looking for. I am looking for memory bandwidth between CUDA cores and HBM. I assume I need to write my own testings to measure this bandwidth.