Efficient computation for Sparse Matrix ( with CRS format ) ?

Hi,
I’m wondering how can I efficiently handling Sparse Matrices on TX1.
Including :

  • Parse CRS format in low-level CUDA layer ( Or, in other format )
  • Compute convolution of Sparse & Dense matrices with best speed & minimum memory

So, my questions are :

  1. Is there any high level APIs for them?
  2. If there exists, does open-frameworks, like caffe and TF, support them?
  3. Does it speed-up compared to Dense&Dense matrices computation?
    ( I saw a few examples where sparse connection do not actually speed-up inference time because of several overheads. )

Hi,

Thanks for your question.

  1. We have cuSPARSE library contains some basic linear algebra subroutines used for handling sparse matrices.
    Read more at: http://docs.nvidia.com/cuda/cusparse/index.html#ixzz4aWEDdaET

  2. AFAIR, TensorFlow also supports sparse tensor
    https://www.tensorflow.org/versions/r0.11/api_docs/python/sparse_ops/
    You can go their forum to get more details.

  3. Speed-up ratio should depend on matrix sparsity. This page could give you some idea about performance.
    https://developer.nvidia.com/cusparse

Thank you, @AastaLLL.
Your information is great helpful!

Besides,
about the speed-up ratio with your link, I can’t find performance comparison of DenseDense vs DenseSparse on NVIDIA platform.
One another, the performance is measured with P100, then, can I expect similar effect on ‘TX1’?

If available, could you please send me more details of Sparse*Dense performance?

Hi,

Thanks for your reply.

Speed-up may be related to matrix sparsity.
We don’t have comparison for DenseDense vs DenseSparse and tx1 version.

Sorry about this.
But you still can get some hint at https://developer.nvidia.com/cusparse