Model optimization and GPU acceleration using low level design improvement


We have a task of improving model inferencing time. Ofcourse on of the solution could be gpu distribution training but we don’t have feasibility to go with this approach for various reasons. So we were thinking of improving inferencing time by optimising low level design by using cython for hardware acceleration on jetson device. So just wanted to know which area I should Target? If any url to study or any idea would really be appreciated.

If this seems very generic, then please ask me the specific topic so that I can provide more information which inturn will help to get correct approach.


Hello @pat.anderson5698 and welcome to the NVIDIA developer forums!

Let’s start with some high level questions:

  • Which Hardware are you using exactly? You mention Jetson, but which model is it?
  • What kind of Deep Learning framework are you using or planning to use?
  • Is this a self developed model or some NVIDIA provided one?

If you can answer those, I can move your post to the correct category in our forums, which would either be Jetson or Deep Learning specific areas.


Hello @MarkusHoHo

Please find my answers:

  1. Jetson Nano to start with.
  2. Pytorch.
  3. Yes, self developed model.

Thank you for the quick reply!

I took the liberty now to get you started with the Jetson community, since any inference optimizations you would want to try out should start with the chosen HW environment.

But I suggest you also have a look in the Frameworks category, maybe look for content tagged with pytorch or jetson-inference.

I hope this will help you with your project!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.