I am planning on buying an AI processing unit to explore AI. My options are:
AGX Orin 64GB:
It has a large VRAM LPDDR5 and is portable. It can help me until the deployment stage.
GeForce RTX 4090 Desktop Version:
It has 24GB VRAM, which is smaller than the 4090, and is much faster than a laptop 4090, with higher memory bandwidth.
GeForce RTX 4090 Laptop Version:
It has 16GB VRAM, which is smaller than the 4090, and it is slower than the desktop version 4090 but is portable.
GeForce RTX 3090:
It has 24GB VRAM but is less costly.
I want to know which one of these is the best option or if there is any other option that is better than all of these. I want it to be versatile so that it can run large LLMs like LLaMA 70B parameter and for computer vision, and it can run for long hours(if not days) without breaking down
I’m not really an AI guy, but a few things might help in your decision…
Jetsons have an integrated GPU (iGPU) where the GPU is tied directly to the memory controller. This means that they share memory with the system. Jetsons (at least currently) cannot use an added discrete GPU (dGPU) on PCIe due to driver issues. Not all system memory will necessarily be available to the GPU for a number of reasons.
Training requirements for memory seem more important than with actually running a model. You could train on a Jetson, but it probably wouldn’t be nearly as useful for training as would a dGPU with lots of VRAM.
Most desktop GPUs are in fact a dGPU, but there are some laptops with an integrated mobile GPU that shares memory with the system RAM. If you’re going to get a laptop make sure it isn’t using an iGPU.
I’m the wrong person to give you a details answer, but you might want to state what kind of training you will do, versus execution of a model. Jetson AGX Orin if quite good at running models, and can do some training, but is best to use a desktop PC with lots of VRAM to train.
Thanks @linuxdev - agree that unless you have an embedded or edge use-case (like on-device compute for robotics, vision systems, or IoT devices) then the dGPU systems will be easier and faster. Except in the case of some of these larger LLMs that exceed the memory requirements of a single GeForce/RTX card - for example, IIRC Llama-70B requires ~35GB VRAM with 4-bit quantization. So that would require multiple 3090/4090 cards for complete GPU-offload, or AGX Orin 64GB can run it at 5 tokens/sec. Orin 64GB can also be used for ample fine-tuning with LORA/qLORA (albeit slowly). So those are some of the trade-offs to consider.