Hi all,
I found out that when I use ollama loading LLM model. some larger model cannot fit 100% to GPU. My observation is starting from ~5.7GB+. Then, ollama starts to split processing to CPU and GPU. Is there anyway for Jetson to allocate max memory for GPU? I would like to study if this is a Jetson configuration issue. Or this is ollama ram allocation issue.
*** Please note that this reply is generated by LLM automatically ***
The issue you’re experiencing with the Jetson Orin NX series and the ollama model loading seems to be related to the memory allocation and configuration of the Jetson device.
First, let’s look at the specifications of the Jetson Orin NX series. The Jetson Orin NX 16GB and 8GB modules have the following memory configurations:
- Jetson Orin NX 16GB: 16GB LPDDR5
- Jetson Orin NX 8GB: 8GB LPDDR5
The Ampere GPU on the Jetson Orin NX series supports end-to-end lossless compression, tile caching, OpenGL 4.6, OpenGL ES 3.2, Vulkan 1.1, and CUDA 11.4+. The CUDA core performance is as follows:
- Jetson Orin NX 16GB: 1024 NVIDIA CUDA cores, 32 Tensor cores, 918 MHz, 1.88 FP32 TFLOPs, 3.76 FP16 TFLOPs
- Jetson Orin NX 8GB: 1024 NVIDIA CUDA cores, 32 Tensor cores, 765 MHz, 1.56 FP32 TFLOPs, 3.12 FP16 TFLOPs
The Deep Learning Accelerator (DLA) on the Jetson Orin NX series has the following specifications:
- Jetson Orin NX 16GB: 2x NVDLA, maximum operating frequency up to 614 MHz, sparse INT8 operations up to 40 TOPS
- Jetson Orin NX 8GB: 1x NVDLA, maximum operating frequency up to 610 MHz, sparse INT8 operations up to 20 TOPS
Now, let’s discuss the power consumption and memory allocation of the Jetson Orin NX series. The power consumption is as follows:
- Jetson Orin NX 16GB: 25W, VDD_IN average power: 25W, VDD_IN instantaneous power: 30W, under voltage: approx. 4.5V
- Jetson Orin NX 8GB: 20W, VDD_IN average power: 20W, VDD_IN instantaneous power: 25W, under voltage: approx. 4.5V
- 40W (Super): VDD_IN average power: 40W, VDD_IN instantaneous power: 40W, under voltage: approx. 4.5V
The memory allocation for the GPU on the Jetson Orin NX series can be configured using the NVIDIA JetPack SDK. However, the maximum memory allocation for the GPU is limited by the total memory available on the device.
In your case, it seems that the ollama model loading is splitting the processing between the CPU and GPU when the model size exceeds 57GB. This could be due to the memory allocation configuration on the Jetson device.
To allocate maximum memory for the GPU on the Jetson Orin NX series, you can try the following:
- Check the NVIDIA JetPack SDK documentation for memory allocation configuration options.
- Use the
nvidia-smicommand to check the current memory allocation and adjust it accordingly. - Consider using a larger Jetson Orin NX module with more memory, such as the 16GB module.
However, it’s also possible that the issue is related to the ollama model loading and not the Jetson configuration. You may want to check the ollama documentation and forums for any known issues or workarounds related to memory allocation.
In comparison, the Jetson Orin Nano series has similar specifications to the Jetson Orin NX series, but with some differences in terms of memory configuration and power consumption. The Jetson Orin Nano 8GB module has 8GB LPDDR5 memory, while the Jetson Orin Nano 4GB module has 4GB LPDDR5 memory.
In summary, the issue you’re experiencing with the ollama model loading on the Jetson Orin NX series seems to be related to the memory allocation configuration on the device. You can try adjusting the memory allocation configuration using the NVIDIA JetPack SDK and checking the ollama documentation for any known issues or workarounds.
*** Please give the thumbs up if you get answers you like. Or provide feedback to help us improve the answer. ***
Can you please point out the exact memory configuration setting information?
Hi,
For Orin Nano, please check the document below to see if it helps:
Thanks.
Thanks. I optimized it already and I just want to know if Jetson and set the memory allocation.
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.