Hi,
I have writen a program with CUDA and now I am achiving the perfomance of 6,3 TFlops with V100 GPU (Theorical Peak Performance of V100 is 7TF). How can I show that this the maximum real peak performnace that I can get? or how can I be sure that with more efforts I am not getting more performance?