Does Jetson Xavier NX 16 GB support sparse tensor?

johnminho · June 15, 2023, 8:22am

I have Pytorch model and I pruned my model. The number of weights after pruning is same as before pruning, but there are many zero weights (unstructured pruning). I converted .pt model to .onnx and .engine model. I want to know: Does Jetson Xavier NX 16 GB support sparse tensor as in my case to accelerate calculation?

I read the blog Accelerating Inference with Sparsity Using the NVIDIA Ampere Architecture and NVIDIA TensorRT | NVIDIA Technical Blog, it seems that A100 with Ampere architeture support sparse tensor.
Thanks.

spolisetty · June 16, 2023, 7:05am

Hi,

It should be supported. However, we are moving this post to the Jetson Xavier NX forum to get better help.
For more info please refer,

Thank you.

johnminho · June 16, 2023, 8:54am

@spolisetty
What do we need to do to Jetson NX support sparse tensors?
I mean that, I have model weights with many zeros weights and I do nothing, Jetson NX can speed up inference. Or do i need convert (save) weights in the format of sparse tensor to Jetson NX can speed up inference?

AastaLLL · June 19, 2023, 8:13pm

Hi,

How do you convert the model into TensorRT engine?
If trtexec binary is used, please try it with --sparsity flag:

github.com

NVIDIA/TensorRT/blob/release/8.5/samples/common/sampleOptions.cpp#L2019


      
          "  --int8                      Enable int8 precision, in addition to fp16 (default = disabled)"                                    << std::endl <<
          "  --consistency               Enable consistency check for serialized engine, (default = disabled)"                               << std::endl <<
          "  --std                       Build standard serialized engine, (default = disabled)"                                             << std::endl <<
          "  --calib=<file>              Read INT8 calibration cache file"                                                                   << std::endl <<
          "  --serialized=<file>         Save the serialized network"                                                                        << std::endl <<
          "  --plugins                   Plugin library (.so) to load (can be specified multiple times)"                                     << std::endl <<
          "  --verbose or -v             Use verbose logging (default = false)"                                                              << std::endl <<
          "  --help or -h                Print this message"                                                                                 << std::endl <<
          "  --noBuilderCache            Disable timing cache in builder (default is to enable timing cache)"                                << std::endl <<
          "  --timingCacheFile=<file>    Save/load the serialized global timing cache"                                                       << std::endl <<
          "  --sparsity=spec             Control sparsity (default = disabled). "                                                            << std::endl <<
          "                              Sparsity: spec ::= \"disable\", \"enable\", \"force\""                                              << std::endl <<
          "                              Note: Description about each of these options is as below"                                          << std::endl <<
          "                                    disable = do not enable sparse tactics in the builder (this is the default)"                  << std::endl <<
          "                                    enable  = enable sparse tactics in the builder (but these tactics will only be"               << std::endl <<
          "                                              considered if the weights have the right sparsity pattern)"                         << std::endl <<
          "                                    force   = enable sparse tactics in the builder and force-overwrite the weights to have"       << std::endl <<
          "                                              a sparsity pattern"                                                                 << std::endl <<
          "  --minTiming=M               Set the minimum number of iterations used in kernel selection (default = "                          << std::endl <<
          ""                                                                                               << defaultMinTiming << ")"        << std::endl <<
          "  --avgTiming=M               Set the number of times averaged in each iteration for kernel selection (default = "                << std::endl <<

Thanks.

johnminho · June 20, 2023, 2:08pm

@AastaLLL
Thanks. I used Tensorrt Python API. How can set sparsity flag when using Tensorrt Python API.

Could you tell me more detailed what happen when setting flag sparsity?

AastaLLL · June 29, 2023, 5:51am

Hi,

This can be set with below API:
https://docs.nvidia.com/deeplearning/tensorrt/api/python_api/infer/Core/BuilderConfig.html?highlight=sparse#tensorrt.BuilderFlag

The more info about sparsity can be found below:

Thanks.

johnminho · June 29, 2023, 6:02am

Thanks for the information.

Topic		Replies	Views
Accelerating Inference with Sparsity Using the NVIDIA Ampere Architecture and NVIDIA TensorRT Technical Blog	13	3052	June 2, 2023
Stuctured sparsity 2:4 does not improve inference performance on Jetson Orin TensorRT tensorrt	6	1066	October 17, 2023
Difference between --sparsity=enable and --sparcity=disable in .trtexec utility Jetson AGX Orin tensorrt	4	965	October 11, 2022
Pruning .onnx and convert to .engine Jetson Xavier NX tensorrt	5	1866	May 6, 2022
2:4 sparsity doesnot improve inference performance on RTX 3090 TensorRT tensorrt	14	3685	September 9, 2022
Sparsity on Onnx Model TensorRT	1	248	December 31, 2024
Sparsity does not provide any speedup for TensorRT on DLA Jetson AGX Orin cudnn	6	1166	January 22, 2024
Structured sparsity not working with explicit quantization TensorRT tensorrt	5	1092	March 31, 2022
FCN Alexnet model pruning Jetson TX2	4	1076	October 18, 2021
Same resnext101 model size for dense and sparse Jetson Nano jetson-inference	7	864	January 4, 2024

Does Jetson Xavier NX 16 GB support sparse tensor?

Related topics