Free up more RAM for Ollama (Jetson Orin Nano Super)

vtong · April 28, 2025, 7:46am

Hi all, I would like to free up more RAM for Ollama to run larger model and context.
However, I found out that 6.0GB is the magic number to fit all model and context in GPU. After 6GB, some processes go into CPU.
I stopped a lot of services (including jtop) and disabled the Desktop Env already.
I believe I still have free RAM but I have no idea how to make Ollama to put more RAM to GPU instead of CPU.
As CPU and GPU share same RAM, I have no idea why RAM goes into CPU instead of GPU.

AastaLLL · April 28, 2025, 8:38am

Hi

Could you double-confirm the free memory amount with tegrastats?
As Jetson is a shared memory system, the system can also occupy some memory.

$ sudo tegrastats

Thanks.

vtong · April 28, 2025, 9:04am

Accoring to tergrastats, it still have free memory (If my understanding is correct. )

root@ubuntu:~# ollama ps
NAME ID SIZE PROCESSOR UNTIL
gemma3:4b a2af6cc3eb7f 6.2 GB 9%/91% CPU/GPU 4 minutes from now

root@ubuntu:~# tegrastats
04-28-2025 10:02:16 RAM 4926/7620MB (lfb 4x1MB) SWAP 119/16384MB (cached 12MB) CPU [0%@729,0%@729,0%@729,0%@729,0%@729,0%@729] EMC_FREQ 0%@2133 GR3D_FREQ 0%@[306] NVDEC off NVJPG off NVJPG1 off VIC off OFA off APE 200 cpu@44.656C soc2@45.25C soc0@45.187C gpu@46.5C tj@47.062C soc1@47.062C VDD_IN 4596mW/4596mW VDD_CPU_GPU_CV 524mW/524mW VDD_SOC 1411mW/1411mW

Once, not 100% GPU, performance dropped a lots.

vtong · April 28, 2025, 9:09am

I believe this is reproducible.
Env:
Jetson Orin Nano Dev Kits (8GB), disabled all unnecessary services and Desktop Environment
ollama version is 0.6.6
Ollama (Not docker, direct installation and running)
External Open Web UI connect to Jetson Nano
Model: gemma3:4b, default context length (2048)

AastaLLL · April 30, 2025, 6:53am

Hi,

Do you have detailed ollama logs can share with us?
For example, is there any allocation failure log shown?

github.com/ollama/ollama

ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu

main


      
              if (getenv("GGML_CUDA_REGISTER_HOST") == nullptr) {
                  return false;
              }
          
          #if CUDART_VERSION >= 11010 || defined(GGML_USE_MUSA)
              cudaError_t err = cudaHostRegister(buffer, size, cudaHostRegisterPortable | cudaHostRegisterReadOnly);
              if (err != cudaSuccess) {
                  // clear the error
                  (void)cudaGetLastError();
          
                  GGML_LOG_DEBUG("%s: failed to register %.2f MiB of pinned memory: %s\n", __func__,
                                     size / 1024.0 / 1024.0, cudaGetErrorString(err));
                  return false;
              }
              return true;
          #else
              GGML_UNUSED(buffer);
              GGML_UNUSED(size);
              return false;
          #endif // CUDART_VERSION >= 11010 || defined(GGML_USE_MUSA)
          }

Thanks.

vtong · May 1, 2025, 8:07pm

Thanks for your suggestion. Let me try to use this info to identify what is the problem.

kayccc · May 21, 2025, 1:05am

There is no update from you for a period, assuming this is not an issue anymore.
Hence, we are closing this topic. If need further support, please open a new one.
Thanks ~0521

Topic		Replies	Views
Ollama on Docker does not finmd GPU Jetson Orin Nano generative_ai	4	847	March 5, 2025
CPU RAM vs GPU RAM Jetson Nano kernel	8	2984	October 18, 2021
Ollama timing out when attempting to use GPU instead of CPU Jetson AGX Orin cuda , jetson-inference , generative_ai	9	4626	August 27, 2024
Ollama and Jetson issue Jetson Orin NX jetson-inference , generative_ai	12	5442	March 20, 2024
GPU out of memory when the total ram usage is 2.8G Jetson TX2	28	18550	October 18, 2021
Strategy: how to overcome GPU Out-of-Memory? Jetson Nano opencv , cuda , tensorflow	9	3767	October 18, 2021
A few really important questions regarding Jetson memory system Jetson Nano linux	5	883	December 22, 2021
Nvidia Jetson Orin Nano shows 6.3GB of RAM, run VILA-7B OOM! Jetson Orin Nano generative_ai	4	256	June 18, 2024
[Jetson] Why cudaMalloc can require memory more than "free" type has Jetson Orin NX cuda	5	256	April 10, 2024
Orin NX tensorflow - high memory use Jetson Orin NX tensorflow	3	419	January 3, 2024

Free up more RAM for Ollama (Jetson Orin Nano Super)

Related topics