As I understand sparsity inference using tensorrt we need following process.
- Use pytorch to find optimal pth(dense network) results after learning.
- Reproduce the pth file through sparse relearning using ASP in apex.
- Convert the reproduced pth to onnx.
- Convert onnx to tensorrt plan file (.trt).
./workspace/TensorRT/build/out/trtexec \
–onnx=/workspace/TensorRT/model/resnext101_32x8d_pyt_torchvision_sparse.onnx \ –saveEngine=/workspace/TensorRT/model/resnext101_engine.trt
–explicitBatch
–sparsity=enable
–fp16
- inference the plan file using tensorrt.
I have a question here.
Does tensorrt inference using 2-bit indices (shown in the figure above) information in addition to sparse matrix data?
I don’t think pth, onnx, trt files are structures containing 2 bit indices information. How can tensorrt use the 2 bit indices information which is shown in the picture above?