By comparing the kernel before and after, I determine this kernel assumes the function of the cublasSgemmBatched.
I found that in 2020, someone posted that if you want to use tensorcore in sgemm, cutlass will actually be called.[Does CUBLAS SGEMM work with tensor cores yet?] I’m not sure if this explains the above.
This leads to further problems, since I cannot see a more detailed description of the kernel, and unless doing some data tests, I cannot directly determine whether the tensor core has been successfully turned on.
It’s not clear what your question is. You’ve already enabled tensor core and it appears to have changed the code behavior.
You can use the nsight compute profiler for this. There are numerous forum questions and even a blog article about how to use nsight compute to verify TC usage/activity.
Using the metrics will be one method. Another method is simply to study the compute workload analysis section in the default nsight compute reporting.
I would judge whether TC is being used via the profiler, as I already mentioned.
There isn’t a decoder ring for judging things by kernel names. And even if it seems like there was one in the past, there was no specification for any such thing, so expecting to decode kernel names into the infinite future to determine TC usage is probably not sensible. Therefore I would use the profiler if it were important to me; that is a deterministic method.