I notice cublas supports multiple algorithms inside, is it necessary to tune cublas to get the best kernel?
If I just call cublas directly, do I get the best kernel for my problem size ?
Thank you!
I notice cublas supports multiple algorithms inside, is it necessary to tune cublas to get the best kernel?
If I just call cublas directly, do I get the best kernel for my problem size ?
Thank you!
This blog should explain everything https://developer.nvidia.com/blog/introducing-grouped-gemm-apis-in-cublas-and-more-performance-updates/
Thank you! That’s exactly what I want!
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.