Here is the code snippet from the sample in the documentation,
int baseC, nnzC;
// nnzTotalDevHostPtr points to host memory
int *nnzTotalDevHostPtr = &nnzC;
cusparseSetPointerMode(handle, CUSPARSE_POINTER_MODE_HOST);
cudaMalloc((void**)&csrRowPtrC, sizeof(int)*(m+1));
cusparseXcsrgemmNnz(handle, transA, transB, m, n, k,
descrA, nnzA, csrRowPtrA, csrColIndA,
descrB, nnzB, csrRowPtrB, csrColIndB,
descrC, csrRowPtrC, nnzTotalDevHostPtr );
if (NULL != nnzTotalDevHostPtr){
nnzC = *nnzTotalDevHostPtr;
}else{
cudaMemcpy(&nnzC, csrRowPtrC+m, sizeof(int), cudaMemcpyDeviceToHost);
cudaMemcpy(&baseC, csrRowPtrC, sizeof(int), cudaMemcpyDeviceToHost);
nnzC -= baseC;
}
cudaMalloc((void**)&csrColIndC, sizeof(int)*nnzC);
cudaMalloc((void**)&csrValC, sizeof(float)*nnzC);
cusparseScsrgemm(handle, transA, transB, m, n, k,
descrA, nnzA,
csrValA, csrRowPtrA, csrColIndA,
descrB, nnzB,
csrValB, csrRowPtrB, csrColIndB,
descrC,
csrValC, csrRowPtrC, csrColIndC);
My question is instead of cudaMalloc-ing nnzC*size
for csrColIndC
and csrValC
, if I cudaMalloc with a predetermined constant nnz_pre*size
where I can guarantee nnz_pre
is always larger than nnzC
for my problem, would it cause any problems for standard cusparse operations like cusparseDcsrgeam and cusparseDcsrgemm?
The motivation behind this is for a real time application that involves a camera. It causes significant slow down to cudaMalloc and cudaFree for every single frame. That’s why I want to be able to cudaMalloc a fixed size just once at the very start, and reuse that same chunk of memory for the computation of every single frame.
The current behavior I am observing for my unit test is non-deterministic, sometimes it produces the right result but sometimes it doesn’t. Hence I am unable to provide a minimal example to reproduce the problem. I would love to hear from the cusparse team on whether allocating extra memory is supposed to work or not. If not, is there a way to avoid cudaMalloc-ing and cudaFree-ing for every single frame while doing sparse matrix computation?