Hello, is the 2:4 Structured-Sparse of the Ampere architecture only effective for GEMM? Winograd and FFT don’t work?
Accessing 2:4 structured sparsity depends on the library.
As far as Math Libs, access is available through cuSPARSELt.
Thanks, When I run the trtexec command with sparsity enabled , the inference speed increased by about 1%, I don’t know why? I used ResNet50 with apex/ASP sparsity pruning.
The short answer is that RN50 is full of operations that aren’t math-limited GEMMs (CONVs), so it’ll never see a fantastic end-to-end speedup. The long answer is that everything will depend on:
- What hardware you’re using
- What clocks you’re using, and how efficiently the hardware is cooled
- What version of TRT you’re using
- What data type you’re using
- What batch size you’re using
- If you’re used ASP and saved the model correctly
I would start with this TRT blog and try ResNeXt-101 model within.
Thanks! I will try ResNeXt-101 model