How does CuBLAS use Gpu multi-core?

Hi, I have searched on the web how CuBLAS uses Gpu multi-core, but I haven’t found the response.
In Cuda I manually set grid and block size, it’s right to think that CuBLAS function like CublasGemm automatically do something like this?

CUBLAS “automagically” selects algorithm and execution parameters based on the dimensions of the arrays supplied to it. If you want to get a sense of what it does, you can run CUBLAS calling code with the visual profiler or Nsight and you will see what kernels and execution parameters it uses under different circumstances.

Is it possible to read cublasSgemm code ?

No, but the UTK Magma project have opened their source, and it contains BLAS3 functions including gemm. I understand some of that code is also used in CUBLAS 3.2.

yeah but the very interesting question is: what about multi gpu ? and what about gpu-clusters?

is this also handled “automagicly” or shall i split the matrices manually (which is not “that” hard)

I’m just newbie, so I’m not sure but reading some topic on this forum I see that Cublas use only one GPU, if you have two or more gpu you must split input matrix.