I have Pytorch model and I pruned my model. The number of weights after pruning is same as before pruning, but there are many zero weights (unstructured pruning). I converted .pt model to .onnx and .engine model. I want to know: Does Jetson Xavier NX 16 GB support sparse tensor as in my case to accelerate calculation?
I read the blog Accelerating Inference with Sparsity Using the NVIDIA Ampere Architecture and NVIDIA TensorRT | NVIDIA Technical Blog , it seems that A100 with Ampere architeture support sparse tensor.
Thanks.
Hi,
It should be supported. However, we are moving this post to the Jetson Xavier NX forum to get better help.
For more info please refer,
Thank you.
1 Like
@spolisetty
What do we need to do to Jetson NX support sparse tensors?
I mean that, I have model weights with many zeros weights and I do nothing, Jetson NX can speed up inference. Or do i need convert (save) weights in the format of sparse tensor to Jetson NX can speed up inference?
Hi,
How do you convert the model into TensorRT engine?
If trtexec binary is used, please try it with --sparsity
flag:
" --int8 Enable int8 precision, in addition to fp16 (default = disabled)" << std::endl <<
" --consistency Enable consistency check for serialized engine, (default = disabled)" << std::endl <<
" --std Build standard serialized engine, (default = disabled)" << std::endl <<
" --calib=<file> Read INT8 calibration cache file" << std::endl <<
" --serialized=<file> Save the serialized network" << std::endl <<
" --plugins Plugin library (.so) to load (can be specified multiple times)" << std::endl <<
" --verbose or -v Use verbose logging (default = false)" << std::endl <<
" --help or -h Print this message" << std::endl <<
" --noBuilderCache Disable timing cache in builder (default is to enable timing cache)" << std::endl <<
" --timingCacheFile=<file> Save/load the serialized global timing cache" << std::endl <<
" --sparsity=spec Control sparsity (default = disabled). " << std::endl <<
" Sparsity: spec ::= \"disable\", \"enable\", \"force\"" << std::endl <<
" Note: Description about each of these options is as below" << std::endl <<
" disable = do not enable sparse tactics in the builder (this is the default)" << std::endl <<
" enable = enable sparse tactics in the builder (but these tactics will only be" << std::endl <<
" considered if the weights have the right sparsity pattern)" << std::endl <<
" force = enable sparse tactics in the builder and force-overwrite the weights to have" << std::endl <<
" a sparsity pattern" << std::endl <<
" --minTiming=M Set the minimum number of iterations used in kernel selection (default = " << std::endl <<
"" << defaultMinTiming << ")" << std::endl <<
" --avgTiming=M Set the number of times averaged in each iteration for kernel selection (default = " << std::endl <<
Thanks.
1 Like
@AastaLLL
Thanks. I used Tensorrt Python API. How can set sparsity
flag when using Tensorrt Python API.
Could you tell me more detailed what happen when setting flag sparsity?
Thanks for the information.
system
Closed
July 19, 2023, 5:45am
12
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.