Maximum concurent kernels For numbers of streams > 16

jam11 April 8, 2011, 5:09pm 1

Hi,

From the reference document (page 11) for CUBLAS,

Does this means that a batch of 16 kernels will be executed in series until all 1024 streams are done?

In other words, 1024/16 batches will be calculated one after the other?

Assuming all the data for all the small matrices is transfered to the GPU at once.

Topic		Replies	Views
Concurrent kernels execution using streams in multiple CPU threads CUDA Programming and Performance	7	10733	June 26, 2012
Streaming Concurrent Kernels (in Fermi GPUs) ... CUDA Programming and Performance	2	1444	May 7, 2013
Concurrent Kernel Execution CUDA Programming and Performance	6	13695	April 18, 2011
How many streams should I use for concurrent kernels? CUDA Programming and Performance	6	4434	September 3, 2010
Easiest way to invoke two different kernels simultaneously ? CUDA Programming and Performance	4	5839	April 12, 2012
Fermi streams and kernels CUDA Programming and Performance	5	1889	July 22, 2010
Hundreds of parallel matrix-vector multiplications with cuBLAS GPU-Accelerated Libraries	8	2410	April 8, 2021
Max 1 or 2 concurrent kernels per SM? CUDA Programming and Performance	19	12031	May 22, 2014
My streams are not running concurrently CUDA Programming and Performance	7	1902	March 6, 2018
Number of concurrent kernel executions on GTX480 CUDA Programming and Performance	11	11495	June 27, 2010