Hi:
I’m trying to deploy vision transformer on jetson orin. The speed is really slow. I can see that fastertransformer has examples for int8 swin transformer, but I think it doesn’t work on orin, is there any other ways to speed up transformer on orin?
Hi,
You can maximize the device performance with the following command:
$ sudo nvpmodel -m 0
$ sudo jetson_clocks
Which frameworks do you use?
For TensorRT, INT8 inference is available on the Orin.
Thanks.
I’m trying to deploy swin transformer as a tensorrt engine on orin. I tried fastertransformer but I failed to compile it. Int8 inference of swin transformer makes the model slower according to my experiment. Maybe I missed anything?
Hi,
Could you share the command that you convert the model into TensorRT INT8 engine?
And the benchmark source/command as well?
Thanks.
Hi:
I used quantiazation tool from FasterTransformer:
Since I can’t run FasterTransformer on Orin, I tried to convert the quantized torch model to onnx and then to tensorrt engine using usual torch.onnx.export and convert it with trtexec.
Hi,
Is PTQ an option for you?
Convert the fp32 model into ONNX and convert it to TensorRT engine with trtexec and -int8
flag.
Thanks.
haven’t tried tensorrt’s native ptq yet, will try it. But I would still prefer a solution with QAT.
There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks
Hi,
Could you share the command that you converted the model into the TensorRT engine with us?
Is it done by trtexec?
Thanks.