To speedup deep network, I intend to reduce FLOPs by pruning my network connections.
This results in multiplication between a sparse and dense matrices
I am using cuSPARSE csrmm() to perform the matrix multiplication:
top = bottom * sparse_weight’
top = 300x4096
bottom = 300x25088
sparse_weight = 4096x25088 (10% non zero, unstructured)
I am getting timing like 150 ms for csrmm() whereas regular dense cublassgemm() gives 47 ms.
It seems that csrmm() is only faster than cublassgemm() if sparsity is about 1 to 2% non-zeros.
Please let me know if I have done anything wrong?
Do you have suggestion to improve my speed?