AGX Orin 64GB computational limits?

Hi everyone. After some reflections I think I’ll buy an AGX Orin dev kit 64gb (to have something that will last for 3-4 years).
Now the main concern: what are its current limits? I mean, is there some kind of model (by NVIDIA NGC or HuggingFace) that is simply too big/computationally intensive for the AGX?
I’d like to use Bloom, StableDIffusion, SantaCoder, and of course the NVIDIA AI stack at the highest capacity allowed by AGX. What will not run with it?
Thanks, and sorry if my question sounds naive.

I’m not able to comment it’s no limitaton, but it’s the most powerful AI edge device.

1 Like

I’ll suggest that the biggest limitation isn’t in its computing power. You do need to be careful though to understand it is an integrated GPU (iGPU), and the APIs it can use are tied to its particular L4T (Ubuntu + NVIDIA drivers) release. Whatever you need, be certain that the flashed release supports the CUDA or other API releases it needs to work with.

Also, despite being an extraordinary GPU device at that size, it still is not the same as a high end discrete GPU (dGPU) you might find on a desktop. The iGPU uses the same memory as the CPU on an Orin, whereas a dGPU has its own memory. Memory consumption limits differ between dGPU and iGPU.

1 Like

Thank you for your reply.
As per the use case, I plan to use it essentially in 2 different scenarios:

  1. training new models from scratch (mainly some BERT-related ones)
  2. learning and experimenting with some cool models I can find online (e.g. from Since some of these activities overlap with my job I cannot use for example Colab, I need the gear to be on my desk.


For training a desktop GPU is recommended, although it is capable of training if there is sufficient memory. Even a 1080 Ti has 3584 CUDA cores, and although the Orin has overtaken a GTX 1060 (I think 2048 cores for an Orin), you’ll very quickly get ahead training on most desktops with a 1080 Ti or better. The part which makes many of the higher end training systems more expensive though isn’t necessarily the number of CUDA cores…you’ll see that the Titan series (and more specialized GPUs) tend to have a lot more VRAM. That VRAM is quite fast compared to a Jetson’s RAM, and the Jetson only has available whatever is left over after the operating system uses RAM. Still, if you were to use an Orin for training, it could probably do the job (something like a TX2 I would not recommend for training).

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.