Feature Request - Libraries

Sabaron · April 22, 2011, 1:47pm

Not sure where feature requests are being documented these days so I figured I’d post here hoping some NVIDIA folks are listening.

For CUBLAS it’d be really nice to be able to batch up calls without separate streams (to avoid hitting 16 concurrent kernels on fermi and blocking). For example, I’ve got multiple matrices loaded and want to save on the kernel overhead from launching each individually.

For CUFFT, a way to perform a fast convolution, providing the vector to multiply by (with ability to reuse it for batch calls if desired) would save on a lot of overhead from having 1 launch instead of 3 separate launches, the multiply kernel being very high overhead since it’s only a couple lines.

Thanks.

Topic		Replies	Views
Using a CUDA library call as a device function instead of a kernel launch CUDA Programming and Performance	3	546	April 2, 2018
CUFFT (and kernel) questions CUDA Programming and Performance	1	2220	August 14, 2009
Conditions for CUDA streams to overlap CUDA Programming and Performance	5	4329	June 9, 2013
Optimizing Sequential cuBLAS Calls for Matrix Operations—Alternatives to Kernel Fusion? GPU-Accelerated Libraries cublas	3	388	April 29, 2024
Overhead of invocation of different kernels are multiple kernels cached somehow? CUDA Programming and Performance	1	5220	July 31, 2008
concurrent kernels CUDA Programming and Performance	2	848	May 2, 2011
Kernel Functions Blocking Multithreaded Application? CUDA Programming and Performance	11	1086	October 12, 2021
Scaling MULTI-GPU CUBLAS and CUFFT Developer Feedback CUDA Programming and Performance	0	6219	March 15, 2011
hand-written kernel vs. CUDA library performance CUDA Programming and Performance	1	851	December 11, 2018
Invoking kernel from multiple PC processes CUDA Programming and Performance	1	5500	June 3, 2011

Feature Request - Libraries

Related topics