I use CUDA 4.0.13 and tesla C2050
cudaSetDevice(0)
cudaStreamCreate(&s0)
cublasCreate(&b0)
… mem copy to dev
cudaSetDevice(1)
cudaStreamCreate(&s1)
cublasCreate(&b1)
… mem copy to dev
cudaSetDevice(0)
cublasSetStream(s0)
cublasSgemm(b0, … )
cudaSetDevice(1)
cublasSetStream(s1)
cublasSgemm(b1, … )
… // do other work
For example, time of exec 1 context with 1 GPU = 10sec, time of exec 2 context with 2 GPU = 20sec. HenÑe, switch context it dont work or calculations run from synchronic mode.
Why it dont work ? Or CUDA 4.0.13 unsupport cublas multugpu with asynchronic mode ?
P.S. sorry for my english