Is it necessary to tune cublas to get the best performance?

I notice cublas supports multiple algorithms inside, is it necessary to tune cublas to get the best kernel?

If I just call cublas directly, do I get the best kernel for my problem size ?

Thank you!

This blog should explain everything https://developer.nvidia.com/blog/introducing-grouped-gemm-apis-in-cublas-and-more-performance-updates/

Thank you! That’s exactly what I want!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.