Optimize layers performance of jetson nano

Hi, I’m testing model FPS on my Jetson Nano 4GB on Tensorrt and I’m trying to optimize model performance. I read the docs Optimizing layer, Layer Base Classes and Data Format to find solution. However, there are some points I’m struggling with:

  1. There are several data formats (CHW2, HWC8, CHW4, …) but the Optimizing layer says that Tensor dim should be multiples of 32. If my tensor has the number of channels divisible by 2 and is not divisible by 32 and its data format is CHW2, that is not optimized as that of the channels divisible by 32, rights.
  2. When I converted my models from onnx to model engine by python code on my Jetson Nano, I checked that the Format/Datatype is always Two wide channel vectorized row major FP16 format regardless of number of channel. Can sb explain that?

Edit: I found that Jetson Nano not support Tensor Core so the Optimizing layer is not applied. So, how to choose layer size or do sth to optimize performance on Jetson Nano?

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

Sorry for the late response.
Is this still an issue to support? Any result can be shared?