I’ve read https://developer.nvidia.com/blog/accelerating-inference-with-sparsity-using-ampere-and-tensorrt/ page and installed APEX with submodules.
I am pruning(sparsity enabling) my model on my training device using Quadro RTX 4000 with sparsity. (which is Turing architecture)
I want to run the pruned model on ORIN-X device.
(which is Ampere architecture)
In this case, Is it fine to run pruned model between different device?
In the blog and apex(asp) instruction, the channel permutation is key point of sparsity and it is purposed to run on Ampere architecture.
I’ve tried to import ASP (from apex.contrib.sparsity import ASP) in my Turing archiecture training model and it prints out below:
Could not find permutation search CUDA kernels, falling back to CPU path
[ASP][Info] permutation_search_kernels can be imported.
Is it fine to find out the pruned model(structure) between different architecture?
TensorRT Version: 8.4.1
GPU Type: Quadro RTX4000 on pruning device, ORIN-X on inferencing device
Nvidia Driver Version: 470.94 on pruning device
CUDA Version: 11.3 on pruning device
CUDNN Version: 8.4x (maybe?)
Operating System + Version: Ubuntu 18.04 LTS
Python Version (if applicable): 3.9
TensorFlow Version (if applicable)
PyTorch Version (if applicable): 1.12 with cuda
Baremetal or Container (if container which image + tag):
Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)
- Exact steps/commands to build your repro
- Exact steps/commands to run your repro
- Full traceback of errors encountered