I’m wondering how can I efficiently handling Sparse Matrices on TX1.
- Parse CRS format in low-level CUDA layer ( Or, in other format )
- Compute convolution of Sparse & Dense matrices with best speed & minimum memory
So, my questions are :
- Is there any high level APIs for them?
- If there exists, does open-frameworks, like caffe and TF, support them?
- Does it speed-up compared to Dense&Dense matrices computation?
( I saw a few examples where sparse connection do not actually speed-up inference time because of several overheads. )