Sparse matrix operations inside cuda Kernel

marouen.b.guebila · November 9, 2017, 10:29pm

I need to multiply sparse matrices inside the cuda kernel. Each thread would do one sparse matrix operation. I know that the API of cusparse allows to perform sparse operations on device side, but not inside the kernel.
Eigen integrates some functionalities inside the Kernel but not sparse operations.
I tried a bit with thrust using the index of non zeros elements in the matrix, but it wasn´t straightforward to me. Are there any libray that handles sparse matrices in the kernel? Or would it be possible to use thrust to this end?

njuffa · November 9, 2017, 11:09pm

That seems to be a very unusual approach. How big is each matrix?

marouen.b.guebila · November 9, 2017, 11:57pm

this is about 1k in dimensions but they are sparse.

njuffa · November 10, 2017, 12:05am

What exactly do you mean by “one sparse matrix operation”? Presumably you do not envision each thread handling one 1K x 1K matrix.

marouen.b.guebila · November 10, 2017, 12:15am

Thus my whole point, the matrices are sparse so vector-matrix multiplication is actually a few operations and can be handeled by a thread.
I am going to share the algorithm asap to better explain myself. Thanks for helping njuffa.

njuffa · November 10, 2017, 12:39am

I think I am not helping yet, but merely collecting additional information that may enable others to point you in a reasonable direction.

Purely to generate a performance reference point (lower bound), it might be useful to use CUBLAS’s dense matrix operations to handle the 1Kx1K matrices. CUBLAS is the only non-template library available inside device code, best I know.