After experimenting with cuSPARSE I have reached the conclusion that using cuBLAS as much as possible is the easiest-fastest option for my work.
The issue now is the fastest way to convert a matrix in CSC sparse format on the host into device memory dense format. I can do it on the CPU(and copy the dense version over), but am looking for a faster option if possible.
cuSPARSE has the cusparseScsc2dense() function, but the documentation is less robust than cuBLAS documentation.
In particular the cuSPARSE documentation does not make clear if the cscValA, cscRowIndA, and cscColPtrA arrays are host or device.
So do I have to create those arrays in device memory, copy the host versions of them to the device, then call cusparseScsc2dense() to convert to dense on the device? Or is there another less messy and faster way? I can write my own kernel to do this but usually the functions/kernels in the SDK are more optimized.
In general I have found that cuBLAS outperforms cuSPARSE in most cases, and has far less boilerplate. Since I have a way to compute the LU decomp for dense matrices, there really is no point in using the sparse format unless I have some massive data set (over 100 million elements with less than 25% non-zeros).