How Jetson allocate memory for GPU?

vtong · January 3, 2026, 10:43am

Hi all,
I found out that when I use ollama loading LLM model. some larger model cannot fit 100% to GPU. My observation is starting from ~5.7GB+. Then, ollama starts to split processing to CPU and GPU. Is there anyway for Jetson to allocate max memory for GPU? I would like to study if this is a Jetson configuration issue. Or this is ollama ram allocation issue.

carolyuu · January 3, 2026, 11:06am

*** Please note that this reply is generated by LLM automatically ***

The issue you’re experiencing with the Jetson Orin NX series and the ollama model loading seems to be related to the memory allocation and configuration of the Jetson device.

First, let’s look at the specifications of the Jetson Orin NX series. The Jetson Orin NX 16GB and 8GB modules have the following memory configurations:

Jetson Orin NX 16GB: 16GB LPDDR5
Jetson Orin NX 8GB: 8GB LPDDR5

The Ampere GPU on the Jetson Orin NX series supports end-to-end lossless compression, tile caching, OpenGL 4.6, OpenGL ES 3.2, Vulkan 1.1, and CUDA 11.4+. The CUDA core performance is as follows:

Jetson Orin NX 16GB: 1024 NVIDIA CUDA cores, 32 Tensor cores, 918 MHz, 1.88 FP32 TFLOPs, 3.76 FP16 TFLOPs
Jetson Orin NX 8GB: 1024 NVIDIA CUDA cores, 32 Tensor cores, 765 MHz, 1.56 FP32 TFLOPs, 3.12 FP16 TFLOPs

The Deep Learning Accelerator (DLA) on the Jetson Orin NX series has the following specifications:

Jetson Orin NX 16GB: 2x NVDLA, maximum operating frequency up to 614 MHz, sparse INT8 operations up to 40 TOPS
Jetson Orin NX 8GB: 1x NVDLA, maximum operating frequency up to 610 MHz, sparse INT8 operations up to 20 TOPS

Now, let’s discuss the power consumption and memory allocation of the Jetson Orin NX series. The power consumption is as follows:

Jetson Orin NX 16GB: 25W, VDD_IN average power: 25W, VDD_IN instantaneous power: 30W, under voltage: approx. 4.5V
Jetson Orin NX 8GB: 20W, VDD_IN average power: 20W, VDD_IN instantaneous power: 25W, under voltage: approx. 4.5V
40W (Super): VDD_IN average power: 40W, VDD_IN instantaneous power: 40W, under voltage: approx. 4.5V

The memory allocation for the GPU on the Jetson Orin NX series can be configured using the NVIDIA JetPack SDK. However, the maximum memory allocation for the GPU is limited by the total memory available on the device.

In your case, it seems that the ollama model loading is splitting the processing between the CPU and GPU when the model size exceeds 57GB. This could be due to the memory allocation configuration on the Jetson device.

To allocate maximum memory for the GPU on the Jetson Orin NX series, you can try the following:

Check the NVIDIA JetPack SDK documentation for memory allocation configuration options.
Use the nvidia-smi command to check the current memory allocation and adjust it accordingly.
Consider using a larger Jetson Orin NX module with more memory, such as the 16GB module.

However, it’s also possible that the issue is related to the ollama model loading and not the Jetson configuration. You may want to check the ollama documentation and forums for any known issues or workarounds related to memory allocation.

In comparison, the Jetson Orin Nano series has similar specifications to the Jetson Orin NX series, but with some differences in terms of memory configuration and power consumption. The Jetson Orin Nano 8GB module has 8GB LPDDR5 memory, while the Jetson Orin Nano 4GB module has 4GB LPDDR5 memory.

In summary, the issue you’re experiencing with the ollama model loading on the Jetson Orin NX series seems to be related to the memory allocation configuration on the device. You can try adjusting the memory allocation configuration using the NVIDIA JetPack SDK and checking the ollama documentation for any known issues or workarounds.

*** Please give the thumbs up if you get answers you like. Or provide feedback to help us improve the answer. ***

vtong · January 3, 2026, 11:10am

Can you please point out the exact memory configuration setting information?

AastaLLL · January 5, 2026, 2:20am

Hi,

For Orin Nano, please check the document below to see if it helps:

Thanks.

vtong · January 20, 2026, 7:56am

Thanks. I optimized it already and I just want to know if Jetson and set the memory allocation.

Topic		Replies	Views
Unable to load large models on Jetson Orin Nano Super despite sufficient RAM Jetson Orin Nano llm	5	751	October 28, 2025
Free up more RAM for Ollama (Jetson Orin Nano Super) Jetson Orin Nano generative_ai	6	1673	May 21, 2025
CUDA0 Buffer Error Jetson Orin Nano cuda , jetson , llama	3	608	January 5, 2026
Ollama LLM inference problems on Jetson Orin Nano: CUDA memory allocation failure and CPU memory error Jetson Nano llama	1	123	March 17, 2026
CUDA out of memory Jetson Orin Nano cuda	5	709	November 6, 2025
How to use gpu in orin nano 4gb Jetson Orin Nano gpu-computing	4	208	January 2, 2025
Ollama errors orin nano Jetson Orin NX nvbugs , generative_ai	42	3070	February 12, 2026
Jetson Orin Nano 8GB Docker issue Ollama falls back to CPU when Stable Diffusion is running Jetson Orin Nano generative_ai	2	278	January 2, 2026
Gemma3:4b not using the gpu while gemma3:1b does on orin Jetson Nano super Jetson Orin Nano generative_ai , llama	1	744	June 2, 2025
Jetson Orin Nano Super insufficient GPU memory Jetson Orin Nano cudnn	19	1221	April 29, 2025

How Jetson allocate memory for GPU?

Related topics