cusparseScsrsv_analysis: any limitation of the metric size?

Sherly · February 20, 2013, 9:53pm

I tried to analyze and solve AX=Y, of which A is about 300 MB. But the cusparseScsrsv_analysis returned the value: "CUSPARSE_STATUS_ALLOC_FAILED". Then I tried small metrics of 50 MB, it can work. So I am wondering is it the function or the memory amount of the GPU(1G, GTX 460) that limits the metric size?
Will it work if I change to other GPU devices with bigger memory amount?

njuffa · February 21, 2013, 4:38am

How much free memory does CUDA report just prior to the call to cusparseScsrsv_analysis()? Could memory fragmentation be an issue? I am not familiar with this particular GPU, but given that it does not support ECC I think you should find at least 800 MB available after CUDA initialization (some additional restrictions on allocation sizes may apply if you are on a Win32 platform *). What is the largest matrix size you have been able to proces successfully?

I checked with the CUSPARSE team and cusparseScsrsv_analysis(), unsurprisingly, uses some additional workspace internally. The amount is roughly proportional to the size of the matrix but apparently no simple formula can be stated. I am told that cusparseScsrsv_analysis() should be able to handle a matrix of size 300 MB when 800 MB of GPU memory are available.

How are you estimating the size of the sparse matrix? Assuming the matrix is stored in CSR format, it should be (n+1)sizeof(int)+nnzsizeof(int)+nnz*sizeof(<data_type>), where n is the number of rows and nnz is the number of non-zero elements in the matrix.

Is the transpose or conjugate transpose operation selected (this requires some extra working space)?

*[url]Release Notes :: CUDA Toolkit Documentation
The maximum size of a single memory allocation created by cudaMalloc() or cuMemAlloc() on WDDM devices is limited to MIN( (System Memory Size in MB - 512 MB) / 2, PAGING_BUFFER_SEGMENT_SIZE ). For Vista, PAGING_BUFFER_SEGMENT_SIZE is approximately 2 GB.

Sherly · February 21, 2013, 6:58pm

Thank you.
After that ALLOC_FAILED issue happened, I had changed my GPU from GTX 460(memory 1G) to GTX 560(memory 2G). OS: Win 7, 64 bit, but VS is in win 32 mode.

The sparse matrix is stored in CSR format, the size is (n+1)sizeof(int)+nnzsizeof(int)+nnzsizeof(float)=6 MB+ 278 MB+278 MB=562MB
(n=nnnn0=2457630=1547910, nnz=nonzero_kvpnn0=72749880)

On GTX 560, I tested the free memory prior to the the call to cusparseScsrsv_analysis(), it had 1125 MB left. After the execution of this function:
cusparseStatus=cusparseScsrsv_analysis(handle , CUSPARSE_OPERATION_TRANSPOSE ,nnnn0 , nonzero_kvpnn0 , descrR , c_csrVal_kvp, c_csrRowPtr_kvp,c_csrColIndex_kvp , infoT);
It returned: CUSPARSE_STATUS_EXECUTION_FAILED. (not the CUSPARSE_STATUS_ALLOC_FAILED as former)
cusparseStatus=cusparseScsrsv_analysis(handle , CUSPARSE_OPERATION_NON_TRANSPOSE ,nnnn0 , nonzero_kvpnn0 , descrR , c_csrVal_kvp, c_csrRowPtr_kvp,c_csrColIndex_kvp, info);
It returned: CUSPARSE_STATUS_EXECUTION_FAILED.

Then I decrease the matrix size to 3MB+132MB+132MB(n=nn300, nnz=nonzero_kvp300),it can work. And after the first analysis, free memory is 850MB; after the second analysis, free memory is 845MB. There are a lot of memory left.

njuffa:

How much free memory does CUDA report just prior to the call to cusparseScsrsv_analysis()? Could memory fragmentation be an issue? I am not familiar with this particular GPU, but given that it does not support ECC I think you should find at least 800 MB available after CUDA initialization (some additional restrictions on allocation sizes may apply if you are on a Win32 platform *). What is the largest matrix size you have been able to proces successfully?

I checked with the CUSPARSE team and cusparseScsrsv_analysis(), unsurprisingly, uses some additional workspace internally. The amount is roughly proportional to the size of the matrix but apparently no simple formula can be stated. I am told that cusparseScsrsv_analysis() should be able to handle a matrix of size 300 MB when 800 MB of GPU memory are available.

How are you estimating the size of the sparse matrix? Assuming the matrix is stored in CSR format, it should be (n+1)sizeof(int)+nnzsizeof(int)+nnz*sizeof(<data_type>), where n is the number of rows and nnz is the number of non-zero elements in the matrix.

Is the transpose or conjugate transpose operation selected (this requires some extra working space)?

*http://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html
The maximum size of a single memory allocation created by cudaMalloc() or cuMemAlloc() on WDDM devices is limited to MIN( (System Memory Size in MB - 512 MB) / 2, PAGING_BUFFER_SEGMENT_SIZE ). For Vista, PAGING_BUFFER_SEGMENT_SIZE is approximately 2 GB.

njuffa · February 21, 2013, 10:17pm

For a matrix size of 562MB, the total memory used (inluding internal working buffers) will be about twice that, so it looks like you are very close to the limit of available memory and quite possibly over the limit. To ensure there is nothing more serious going on, I would suggest filing a bug via the registered developer website, attaching a self-contained repro case. Thanks.