Autotuning for GEMM kernel and combination with other kernels

Hi,

Do we have some package or parameter to do an offline or online optimization on the code? e.g. my GEMM kernel is far from peak performance (25% degradation).

If you’re using cuBLAS, internal heuristics will pick the best kernel based on a number of parameters. Peak performance depends on numerous factors.