Would cudaMalloc-ing more memory than what cusparseXcsrgemmNnz calculated for a cusparse matrix work?

jasongyvc6 · June 3, 2020, 10:54pm

Here is the code snippet from the sample in the documentation,

int baseC, nnzC;
// nnzTotalDevHostPtr points to host memory
int *nnzTotalDevHostPtr = &nnzC;
cusparseSetPointerMode(handle, CUSPARSE_POINTER_MODE_HOST);
cudaMalloc((void**)&csrRowPtrC, sizeof(int)*(m+1));
cusparseXcsrgemmNnz(handle, transA, transB, m, n, k,
        descrA, nnzA, csrRowPtrA, csrColIndA,
        descrB, nnzB, csrRowPtrB, csrColIndB,
        descrC, csrRowPtrC, nnzTotalDevHostPtr );
if (NULL != nnzTotalDevHostPtr){
    nnzC = *nnzTotalDevHostPtr;
}else{
    cudaMemcpy(&nnzC, csrRowPtrC+m, sizeof(int), cudaMemcpyDeviceToHost);
    cudaMemcpy(&baseC, csrRowPtrC, sizeof(int), cudaMemcpyDeviceToHost);
    nnzC -= baseC;
}
cudaMalloc((void**)&csrColIndC, sizeof(int)*nnzC);
cudaMalloc((void**)&csrValC, sizeof(float)*nnzC);
cusparseScsrgemm(handle, transA, transB, m, n, k,
        descrA, nnzA,
        csrValA, csrRowPtrA, csrColIndA,
        descrB, nnzB,
        csrValB, csrRowPtrB, csrColIndB,
        descrC,
        csrValC, csrRowPtrC, csrColIndC);

My question is instead of cudaMalloc-ing nnzC*size for csrColIndCand csrValC, if I cudaMalloc with a predetermined constant nnz_pre*size where I can guarantee nnz_pre is always larger than nnzC for my problem, would it cause any problems for standard cusparse operations like cusparseDcsrgeam and cusparseDcsrgemm?

The motivation behind this is for a real time application that involves a camera. It causes significant slow down to cudaMalloc and cudaFree for every single frame. That’s why I want to be able to cudaMalloc a fixed size just once at the very start, and reuse that same chunk of memory for the computation of every single frame.

The current behavior I am observing for my unit test is non-deterministic, sometimes it produces the right result but sometimes it doesn’t. Hence I am unable to provide a minimal example to reproduce the problem. I would love to hear from the cusparse team on whether allocating extra memory is supposed to work or not. If not, is there a way to avoid cudaMalloc-ing and cudaFree-ing for every single frame while doing sparse matrix computation?

Topic		Replies	Views
Memory requirement of cusparseScsr2csc CUDA Programming and Performance	1	5742	March 1, 2011
CuSparse, Kepler and big matrices GPU-Accelerated Libraries	3	929	August 19, 2014
cusparse coo2csr function hangs GPU-Accelerated Libraries	0	1415	July 16, 2013
CudaMalloc/Cudamemcpy Issue? CUDA Programming and Performance	1	1353	March 19, 2008
Product Matrix Matrix (CSR Format) CUDA Programming and Performance	0	3578	July 17, 2009
cusparseScsrsv_analysis: any limitation of the metric size? GPU-Accelerated Libraries	3	1743	February 21, 2013
[Solved]Cusparse illegal memory access unless I increase the size of the matrices for large matrices CUDA Programming and Performance	0	1080	July 24, 2014
Allocating large arrays. CUDA Programming and Performance	6	3829	October 25, 2009
cannot get proper results for cusparseSnnz GPU-Accelerated Libraries	1	1055	January 6, 2016
Problem of two large sparse matrices multiplication in cuParse CUDA Programming and Performance	6	3769	November 21, 2016

Would cudaMalloc-ing more memory than what cusparseXcsrgemmNnz calculated for a cusparse matrix work?

Related topics