Managing Rogue Memory from OpenWebUI + Ollama - Problem and Solution:

Problem Overview

In configuring a DGX system with OpenWebUI and Ollama to run large-scale models such as GPT-OSS 120B, a critical problem was encountered:

Upon stopping the application via a Docker stop command, the system memory was not being fully released.
This resulted in “rogue” memory consumption, where the memory used by the model (~64GB or more) remained allocated, even though the application appeared to be shut down.

This behavior caused confusion and inefficiencies—users expected the model to unload, but the memory footprint persisted, degrading system performance and making it unclear whether OpenWebUI or Ollama had truly stopped.


Investigation Findings

  1. Stopping the Docker container (docker stop open-webui) was insufficient.

  2. Ollama (which loads the large model into memory) spawns background processes that can remain active even after Docker stops.

  3. These processes consume significant RAM and are not visible in simple docker ps or nvidia-smi checks.

  4. Attempts to clear shared memory segments using ipcrm targeting IPC shared memory (ipcs -m) were not reliably releasing the RAM.

  5. The actual RAM release was achieved only through: