Hi,
I’m wondering how can I efficiently handling Sparse Matrices on TX1.
Including :
- Parse CRS format in low-level CUDA layer ( Or, in other format )
- Compute convolution of Sparse & Dense matrices with best speed & minimum memory
So, my questions are :
- Is there any high level APIs for them?
- If there exists, does open-frameworks, like caffe and TF, support them?
- Does it speed-up compared to Dense&Dense matrices computation?
( I saw a few examples where sparse connection do not actually speed-up inference time because of several overheads. )
Hi,
Thanks for your question.
-
We have cuSPARSE library contains some basic linear algebra subroutines used for handling sparse matrices.
Read more at: [url]http://docs.nvidia.com/cuda/cusparse/index.html#ixzz4aWEDdaET[/url]
-
AFAIR, TensorFlow also supports sparse tensor
https://www.tensorflow.org/versions/r0.11/api_docs/python/sparse_ops/
You can go their forum to get more details.
-
Speed-up ratio should depend on matrix sparsity. This page could give you some idea about performance.
cuSPARSE | NVIDIA Developer
Thank you, @AastaLLL.
Your information is great helpful!
Besides,
about the speed-up ratio with your link, I can’t find performance comparison of DenseDense vs DenseSparse on NVIDIA platform.
One another, the performance is measured with P100, then, can I expect similar effect on ‘TX1’?
If available, could you please send me more details of Sparse*Dense performance?
Hi,
Thanks for your reply.
Speed-up may be related to matrix sparsity.
We don’t have comparison for DenseDense vs DenseSparse and tx1 version.
Sorry about this.
But you still can get some hint at cuSPARSE | NVIDIA Developer