Hi all, I would like to free up more RAM for Ollama to run larger model and context.
However, I found out that 6.0GB is the magic number to fit all model and context in GPU. After 6GB, some processes go into CPU.
I stopped a lot of services (including jtop) and disabled the Desktop Env already.
I believe I still have free RAM but I have no idea how to make Ollama to put more RAM to GPU instead of CPU.
As CPU and GPU share same RAM, I have no idea why RAM goes into CPU instead of GPU.
Hi
Could you double-confirm the free memory amount with tegrastats?
As Jetson is a shared memory system, the system can also occupy some memory.
$ sudo tegrastats
Thanks.
Accoring to tergrastats, it still have free memory (If my understanding is correct. )
root@ubuntu:~# ollama ps
NAME ID SIZE PROCESSOR UNTIL
gemma3:4b a2af6cc3eb7f 6.2 GB 9%/91% CPU/GPU 4 minutes from now
root@ubuntu:~# tegrastats
04-28-2025 10:02:16 RAM 4926/7620MB (lfb 4x1MB) SWAP 119/16384MB (cached 12MB) CPU [0%@729,0%@729,0%@729,0%@729,0%@729,0%@729] EMC_FREQ 0%@2133 GR3D_FREQ 0%@[306] NVDEC off NVJPG off NVJPG1 off VIC off OFA off APE 200 cpu@44.656C soc2@45.25C soc0@45.187C gpu@46.5C tj@47.062C soc1@47.062C VDD_IN 4596mW/4596mW VDD_CPU_GPU_CV 524mW/524mW VDD_SOC 1411mW/1411mW
Once, not 100% GPU, performance dropped a lots.
I believe this is reproducible.
Env:
Jetson Orin Nano Dev Kits (8GB), disabled all unnecessary services and Desktop Environment
ollama version is 0.6.6
Ollama (Not docker, direct installation and running)
External Open Web UI connect to Jetson Nano
Model: gemma3:4b, default context length (2048)
Hi,
Do you have detailed ollama logs can share with us?
For example, is there any allocation failure log shown?
Thanks.
Thanks for your suggestion. Let me try to use this info to identify what is the problem.
There is no update from you for a period, assuming this is not an issue anymore.
Hence, we are closing this topic. If need further support, please open a new one.
Thanks ~0521