Can cublasXt be used to process batched matmul?

AI & Data Science AI Foundation Models and Endpoints

gostraight0304 September 11, 2024, 11:01pm 1

I want to use the hybrid CPU-GPU feature of cublasXt to reduce the memory usage of matrix multiplication, and it works for single-batch matmul. I wonder can cublasXt handle the multi-batch matmul as well? If not, is there any other library has feature like that to save the device-memory usage of batched matmul?

Topic		Replies	Views
Optimizing Sequential cuBLAS Calls for Matrix Operations—Alternatives to Kernel Fusion? GPU-Accelerated Libraries cublas	3	474	April 29, 2024
cublastLt optimize memory usage for triangular matrix GPU-Accelerated Libraries cublas , cusolver	0	361	December 15, 2023
Having multiple relatively small problems GPU-Accelerated Libraries cublas , cusolver	5	786	April 7, 2022
Stripmining matmul for bandwidth optimization host-to-gpu for LLM computation GPU-Accelerated Libraries cublas	2	413	February 26, 2024
Out of core sparse*dense matrix multiplication GPU-Accelerated Libraries	0	847	June 24, 2015
cuBLAS matrix multiplication of a batch CUDA Programming and Performance	0	393	May 2, 2020
cublasSgemm with cudaMallocPitch? CUDA Programming and Performance	0	4824	September 7, 2011
SgemmBatched to multiply batched matrix and non batched matrix CUDA Programming and Performance	1	1073	April 16, 2015
cuBLAS-XT with different cards GPU-Accelerated Libraries	3	1391	October 25, 2015
Batch Matrix Multiplication using CuBLAS GPU-Accelerated Libraries tensorrt , cuda , c-plus-plus	1	1062	February 19, 2021

Can cublasXt be used to process batched matmul?

Related topics