Also, can you please provide any pointer on the way to explicitly control which stream works on which SM’s tensor/cuda core
This not possible. The closest thing to controlling what is launched on particular number of SMs is cublasSetSmCountTarget
Also, can you please provide any pointer on the way to explicitly control which stream works on which SM’s tensor/cuda core
This not possible. The closest thing to controlling what is launched on particular number of SMs is cublasSetSmCountTarget