Percentage of zeros in matrix which necessitates sparse format over dense

In general I have found that cuBLAS is fast enough where I just always use the dense format for matrices. At this point I am trying to determine the threshold(%) of zero-entries (when I may be able to know this in advance) which would determine which format (and which library, cuBLAS or cuSPARSE) to use.

There have been other posts which claim that equivalent operations on moderate sized matrices (with lets say 50% zero entries) are still faster using cuBLAS than cuSPARSE, even though the total number of operations for the dense format is higher.

I will be receiving matrix data in CSC format, and am tempted to use the cuSPARSE csc2dense() conversion, then use the cuBLAS functions rather than muck around with all the additional pointers used for the csc format.

Obviously there are limits to the dense format, but I have yet to see an input set which I could not handle in dense format.

I am sure such data exits, but most scientists want to believe that they should always use the sparse format. On the CPU the conversion to sparse format saves time, but on the GPU this seems not to be as true.

So if I am able to estimate the % of zero entries, and am operating on matrices with dimensions of (30,000 x 10,000), what threshold of zero entries will make dealing with the csc sparse format necessary?

There is no hard-and-fast answer to this question. For one, dense multiplication is a compute-bound task, while sparse multiplication is a bandwidth-bound task. So the switch-over point is influenced by the GFLOPS to bandwidth ratio of the platform. In addition, sparsity characteristics of matrices can limit the amount of available parallelism on a wide data-parallel platform like a GPU, and this will further interact with the algroithm that is being performeed. So it is probably best to simply try it both ways with the use case at hand to get an accurate answer.

I checked with the CUSPARSE team, and they suggested the following heuristic: […] matrices that are > 10k in size are usually treated as sparse, while matrices that are < 4k in size are usually treated as dense. For the matrices in the 4k-10k range the best storage format depends highly on the sparsity pattern and the algorithm that needs to be performed. […] usually assume the matrix is sparse when < 10% of the elements are non-zeros.


Thanks for the info. When you talk about matrix size are you referring to the total number of elements(non-zeros), or the dimensions of the smallest side ( min(row, columns))?

In general cuBLAS has been very fast so I prefer to use the dense format, but there is some functionality in cuSPARSE which does not exist in cuBLAS.

The funny thing is that the matrices I often get from the scientists are always in sparse format, even though they usually have 50-90% non-zeros. This seems to be because they are so used to the MATLAB sparse format. They seem horrified that I suggest using the dense format, but overall that worked better (ease of coding and speed).

What I really need now is the ability to compute eigen values/vectors, as neither cuBLAS or cuSPARSE offer this functionality.

njuffa, sorry for the naive question but, could you explain why “sparse multiplication is a bandwith-bound task”? Is boundwith-bound the same as I/O bound?