Dissecting Tensor Cores via Microbenchmarks: Latency, Throughput and numeric Behaviors

A very recently updated paper that may be of interest: [2206.02874] Dissecting Tensor Cores via Microbenchmarks: Latency, Throughput and Numerical Behaviors