We do GEMM benchmarks on Jetson AGX Xavier dev kit. We obtained some numbers for different matrix size and memory methods. But before making any conclusion I want to compare with proven results.
Please, could you provide links to official nvidia benchmark for GEMM fp16/fp32.
Even better links to independent researches papers who also did GEMM benchmarks.
Hi @m.o.zhegulin, I don’t believe we have GEMM benchmarks published, you might want to try looking in the CUDA toolkit or cuBLAS samples, or searching for academic papers. You will get better performance from using TensorCores in the Xavier GPU, which cuBLAS can use. Thanks.
Yes, we did our benchmarks with cuBLAS lib. Also we discovered magic number of matrix size (184) after which activates TensorCores for computation. But it’s always good to compare own numbers with some reference.