With DLA is even slower than without DLA

2.txt (109.1 KB)
i am trying to build a tensorrt engine by using xavier tensor cores. but the performance is even slower.
can you give some advices .

Dear @mayuning2nd,
Logs indicates, few layers are not supported by DLA. So intermediate data transfers across GPU and DLA layers increases over all execution time. In such cases, you can use only GPU to get better performance.

Thanks for the reply. is there any other methods that is known to be efficient for achieving better performance. (with minor accuracy drop)

Dear @mayuning2nd,
you may try low precision inference(fl16 or int8).

1 Like

hi, I try the pytorch_quantization and stucked with the onnx exporting.
according to the online documentation of pytorch_quantization(2.2.0),
I need to call : with pytorch_quantization.enable_onnx_export():
but my version of pytorch_quantization does not have enable_onnx_export attributes. I failed to upgrade pytorch_quantization to 2.2.0.
is 2.2.0 released publicly already?

Dear @mayuning2nd,
Does this require any support? Can you check in relevant forum about pytorch version availability?

I wonder the pytorch quantization 2.2.0 is not avaliable public online. however the document is based on 2.2.0

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.