Hi all, I would like to free up more RAM for Ollama to run larger model and context.
However, I found out that 6.0GB is the magic number to fit all model and context in GPU. After 6GB, some processes go into CPU.
I stopped a lot of services (including jtop) and disabled the Desktop Env already.
I believe I still have free RAM but I have no idea how to make Ollama to put more RAM to GPU instead of CPU.
As CPU and GPU share same RAM, I have no idea why RAM goes into CPU instead of GPU.
I believe this is reproducible.
Env:
Jetson Orin Nano Dev Kits (8GB), disabled all unnecessary services and Desktop Environment
ollama version is 0.6.6
Ollama (Not docker, direct installation and running)
External Open Web UI connect to Jetson Nano
Model: gemma3:4b, default context length (2048)
There is no update from you for a period, assuming this is not an issue anymore.
Hence, we are closing this topic. If need further support, please open a new one.
Thanks ~0521