Is there any CUDA API available? Seems like cudaDeviceGetAttribute is not able to get the tensor core count.