Enabling sparsity to model between other devices using tensorrt


I’ve read https://developer.nvidia.com/blog/accelerating-inference-with-sparsity-using-ampere-and-tensorrt/ page and installed APEX with submodules.

I am pruning(sparsity enabling) my model on my training device using Quadro RTX 4000 with sparsity. (which is Turing architecture)
I want to run the pruned model on ORIN-X device.
(which is Ampere architecture)

In this case, Is it fine to run pruned model between different device?
In the blog and apex(asp) instruction, the channel permutation is key point of sparsity and it is purposed to run on Ampere architecture.

I’ve tried to import ASP (from apex.contrib.sparsity import ASP) in my Turing archiecture training model and it prints out below:

Could not find permutation search CUDA kernels, falling back to CPU path
[ASP][Info] permutation_search_kernels can be imported.

Is it fine to find out the pruned model(structure) between different architecture?



TensorRT Version: 8.4.1
GPU Type: Quadro RTX4000 on pruning device, ORIN-X on inferencing device
Nvidia Driver Version: 470.94 on pruning device
CUDA Version: 11.3 on pruning device
CUDNN Version: 8.4x (maybe?)
Operating System + Version: Ubuntu 18.04 LTS
Python Version (if applicable): 3.9
TensorFlow Version (if applicable)
PyTorch Version (if applicable): 1.12 with cuda
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

You will need to install APEX in your environment in order to use GPU-accelerated search (which is highly recommended and much faster than using the CPU). The one-line command is in the ASP README

Also, it is fine to generate the model on Turing and deploy on Ampere.