Welcome!
Some information might be useful to you (and this is just another way of agreeing with @fchkjwlsq)âŠ
Jetsons have an integrated GPU (iGPU), and this means they are wired directly to the memory controller. A desktop PC uses a discrete GPU (dGPU), which goes through the PCI bus. This means the iGPU shares memory with the system, but a dGPU has its own memory (VRAM). The iGPU must share memory with the rest of the system.
When an iGPU shares memory, it has some requirements. One of those is that it tends to want to use unfragmented memory which is a single large block, and probably has some alignment requirements. Even if you have a lot of free system memory it is possible some of that cannot be used simply because it isnât a single contiguous block, or perhaps because it starts at an inconvenient address. The dGPU does not have that problem since the memory is its own. Plus the dGPU VRAM is far faster than the usual system RAM.
Video and probably most every driver of hardware requires operating at a physical address level, which means virtual memory translating address blocks wonât work for those drivers. The iGPU connects to the memory controller, but the memory controller is capable of setting up direct 1:1 access to addresses without translation. The dGPU has no need for this. Regardless, the main point is that swap files and swap partitions help user space if there is not enough RAM, but in this case, if you donât have enough memory for running some model (and training takes more memory), then you canât simply add swap.
For training you will almost always be better off with a dGPU that has lots of VRAM. There are good reasons why some people will purchase lower performance GPUs with very high amounts of VRAM. That 3090 is a very good choice for training or execution of a model.
Of all embedded systems the AGX Orin is the top of the food chain. It has a lot of memory, doesnât take a lot of power, and is rather fast. On top of that, Orin has newer software releases coming up, whereas Xavier and others are either about to reach end of new feature stage, or have been there for some time.
Incidentally, I am also a fan of Threadripper. Besides being cost effective, Threadripper has much higher data I/O than many units, and Iâm not just considering within the CPU. Iâm considering the number of PCIe lanes and the speeds they run at. If you have only a single 3090, then you donât really need this, but if you want to add multiple disks, and donât want to share (slow) some of the shared I/O, then Threadripper is a good choice (and it will cost less than Xeon).
All said though, the Orin is probably capable of training, but typically it is used for edge computing of preexisting models.