Hi, the link NERSC / Roofline-on-NVIDIA-GPUs · GitLab show how to plot the tensor core TFLOPS for V100. which is Tensor Core
: 512 x sm__inst_executed_pipe_tensor.sum` . How does this equation change for T4 TensorCores integer performance. Should I add a multiplying factor of 4 to get the Int8 TOPS. If the equation is completely different please provide the same…
1 Like