I am trying to use TensorRT to execute a Bert large model with structured sparsity (2:4). However, I cannot get TensorRT to pick a sparse implementation for any of the layers. Could someone look in to this issue ?
Environment
TensorRT Version: 8.2.0.6 GPU Type: A100 Nvidia Driver Version: CUDA Version: 11.4 CUDNN Version: 8.2.4 Operating System + Version: Ubuntu 18.04.6 Python Version (if applicable): 3.6.9 TensorFlow Version (if applicable): Not applicable PyTorch Version (if applicable): 1.9 Baremetal or Container (if container which image + tag): tensorrt-ubuntu18.04-cuda11.4
We recommend you to please try on the latest TensorRT version 8.4 GA and if you still face the issue could you please try following with trtexec and share the logs for better assistance.
Add --useCudaGraph to see if using CUDA graph helps at all (probably not)
Add --dumpProfile --separateProfileRun --verbose and share the logs. This will give us per-layer performance breakdown.