Tiled cublas gemm on multiple GPUs

tjh · March 31, 2013, 8:24am

Hi!

I’m seeking an efficient implementation that uses streams and concurrency to do matrix multiplications using multiple GPUs on arbitrary large matrices, i.e., similar to what’s described in the webinar on streams and concurrency.
I really appreciate if somebody can provide with such an implementation, or at least parts of it.

Kind regards Toke

eyalhir74 · March 31, 2013, 5:58pm

Hi,
Do you mean partition one matrix over multiple gpus/streams?
If not, you can have different streams calling cublas using cublasSetStream().
You could also use dynamic parallism today on sm 3.5 with cublas.

eyal

tjh · March 31, 2013, 6:12pm

Hi

I mean one big matrix over multiple GPUs. Just like explained here: http://developer.download.nvidia.com/CUDA/training/StreamsAndConcurrencyWebinar.pdf

//Toke

Topic		Replies	Views
Multiple Parallel GPUs CUDA Programming and Performance	4	2498	October 8, 2008
How to concurrent cublas-sgemm by stream? CUDA Programming and Performance	3	6297	February 24, 2017
Two tasks on the same GPU? CUDA Programming and Performance	2	2284	February 17, 2010
Multiple Parallel GPUs CUDA Programming and Performance	1	2326	October 6, 2008
Using cuBLAS in different CUDA streams GPU-Accelerated Libraries	3	3540	June 3, 2015
Dynamic parallelism vs Streams CUDA Programming and Performance	1	581	October 8, 2014
Maximum concurent kernels For numbers of streams > 16 CUDA Programming and Performance	0	945	April 8, 2011
Question about cublas and optimizing multiple matrix operations GPU-Accelerated Libraries	3	560	February 4, 2020
cublas and streams CUDA Programming and Performance	4	5672	June 2, 2010
Multiple GPU computing CUDA Programming and Performance	8	7885	May 7, 2008

Tiled cublas gemm on multiple GPUs

Related topics