OpenClaw + Ollama hybrid + ClawMobile architecture

I’ve been refining a local AI agent architecture designed for real-world software engineering. After some trial and error with Docker isolation and memory bottlenecks, I wanted to share the specific hybrid setup that’s actually proving usable.

The Architecture: Decoupling Reasoning from Retrieval

The core philosophy here is separating the “Cognition” from the “Memory” to solve the latency issues caused by loading massive context into a large model every time it runs.

  1. The AI Gateway (Docker): This container runs the heavy reasoning engine— llama-proxy.py + llama.cpp serving the Qwen 3.5 35B-A3B model.

  2. The AI Agent (OpenClaw) & Orchestrator (Docker): A separate containerized environment for the workspace, project files, and the Orchestrator logic.

  3. The “Second Brain” (Host-Side): I run a secondary embedding model (like nomic-embed-text) via Ollama on the host, integrated with the QMD search engine.

By using the host-side embedded model for data retrieval, I avoid forcing the 35B Qwen model to ingest all memory data for every request, which significantly improves responsiveness.

Claw-Hybrid-Platform/
├── ai-agent/                         # Logic Layer: OpenClaw
│   ├── ai-agent-logs/                # Persistent runtime logs
│   │   └── gateway.log               # Logs for OpenClaw gateway activity
│   └── Dockerfile                    # Environment for agent logic
├── ai-gateway/                       # Computing Layer: LLM & Reasoning Proxy
│   ├── logs/
│   │   └── llama-server.log          # Raw output from llama-server
│   ├── proxy/
│   │   ├── config.yaml               # Gateway/Proxy configurations
│   │   └── llama-proxy.py            # OpenAI-compatible API wrapper for llama.cpp
│   └── Dockerfile                    # Environment for agent gateway
├── llama.cpp/                        # Source: Cloned llama.cpp for local build
├── models/                           # Model Storage: Save .gguf files here
├── persistfolder/                    # Persistent Data(The “Soul” of the Agent)                   
│   ├── config/                       # Holds openclaw.json and global settings
│   ├── memory/                       # Memory storage (main.sqlite)
│   └── workspace/                    # Projects, AGENT.md, MEMORY.md, daily log…etc
└── docker-compose.yml                # Bridges Host GPU & Docker Containers

The 40GB Memory Challenge

Even on high-end hardware like the NVIDIA DGX Spark (120GB), memory management is the primary hurdle.

  • Baseline: The system and browser sit at ~5GB.

  • The Agent Stack: Spin up OpenClaw, an additional ~25GB, which hit 30GB.

  • Full Orchestration: Once the Orchestrator, Redis, and Celery workers are live, the stack hits 32~40GB before even reaching full load.

This high memory footprint is exactly why the hybrid Docker/Host split is necessary—it keeps the reasoning engine isolated while letting the retrieval engine run lean on the host.

The Workflow: Orchestrator + ClawMobile

The real value comes from the synergy between the Orchestrator and the ClawMobile app. It changes the dev process from “babysitting a terminal” to “asynchronous management.”

  • Orchestrator: Handles the heavy lifting—task queuing, multi-phase development, and background execution.

    Task Management: A FastAPI backend with Celery and Redis handles asynchronous task queues, allowing for multi-phase development workflows (create, test, deploy).

    Monitoring: Provides real-time WebSocket log streams and tool-tracking to audit every operation performed by the AI agents.

  • ClawMobile: Since it speaks the OpenClaw Gateway protocol, I can stay connected to the DGX Spark from anywhere. I can check the dashboard to see which tasks are in progress, failed, or completed in the Orchestrator background and provide real-time feedback to the agent while I’m away from my desk.

    Secure Remote Access: Connects via Tailscale or LAN using Ed25519 authentication.

    Mobile Supervision: Allows the user to track Orchestrator tasks and provide real-time feedback to OpenClaw, ensuring continuous improvement without being tethered to a laptop.

**

Resource Architecture Table**

Layer Components Placement Memory Impact
Cognition Qwen 3.5 (35B), OpenClaw Gateway Docker (AI-Gateway) ~25GB
Retrieval Ollama, Nomic-Embed, QMD v2 Host Machine Low (Optimized)
Management FastAPI, Redis, Celery, SQLite Docker (AI-Agent/Orchestrator) ~2-10GB+
Interface Kotlin Android App /
OpenClaw Dashboard or
Orchestrator Frontend
Mobile Device /
Browser
N/A / 500M-1GB

**

Key Takeaways for Developers**

  • Don’t over-contextualize the LLM: Use a secondary, smaller embedding model for RAG/Retrieval to keep your main reasoning model fast.

  • Persistence: Use Docker Compose to mount host volumes for /workspace and /memory so your agent’s “soul” survives a reboot.

  • Infrastructure: Even with 120GB of VRAM, efficiency matters. Separating the Gateway from the Agent allows for better resource allocation.

This setup moves away from “AI as a chatbot” and toward “AI as an autonomous background process” that you manage via mobile.

Tips
In OpenClaw chatbox tag [gemini], [autoresearch], [think] keywords to get more functions.

Reference:

  1. Claw_Setup
  2. ClawMobile
  3. Orchestrator
  4. QMD
  5. openclaw-optimization-guide
  6. Guide: llama.cpp + Qwen3.5-35B-A3B + openclaw on GB10

Hi, I’m currently struggling with the same issues and was on the same path yesterday of a dual model set up.

For the model side, have you tried https://www.byterover.dev/ it has native openclaw integration with up to 92% recall accuracy, and it’s all .json / .md files in a hireracle tree setting instead of a sql or vector DB.

Since this seems to be a dedicated “claw” machine, as mine is as well, Is there a reason to run Openclaw in a docker instead of running as an application itself with auto restart on reboot? I’ve not experienced a “loss of soul” issue yet, even when I’ve hard crashed the DGX.

I haven’t used ByteRover yet. I think ByteRover, Milla-Jovovich/Mempalace are all good management methods. Hermes’ Agent self-improving AI agent framework is also worth considering.

QMD solves the problem of “delivery” (retrieval speed), while ByteRover solves the problem of “warehousing and logistics management” (structuring and lifecycle of memory).

==============================================

What are the advantages and disadvantages of running OpenClaw on Docker?

✅ Advantages (Pros)

1. File System “Safety Box” Mechanism

Preventing Accidental Deletion: In Docker, you can mount via ReadOnly or only mount specific /workspace directories. Even if the agent executes rm -rf /, it will only destroy the virtual system inside the container, not harm your DGX host.

Malicious Code Isolation: If the agent browses the web, downloads and executes a malicious script, the script will be trapped in the container’s Linux environment, unable to easily gain root privileges on the host.

2. Resource Constraints

You can utilize Docker’s --memory or --cpus constraints to prevent OpenClaw from accidentally exhausting all of DGX’s CPU resources when handling complex tasks, causing the host to crash or SSH connection to drop.

❌ Disadvantages and Challenges (Cons)

1. Network Overhead

Problem: The current architecture relies on host.docker.internal:11434 for communication with the host’s LLM.

Impact: Large amounts of embedding data and token streams transmitted between the container and the host pass through Docker’s virtual network bridge layer (docker0), resulting in additional latency. For AI dialogues that prioritize “real-time” performance, this will be slightly slower than native execution.

2. Permission Issues

Pain Point: This is the most common problem encountered with Docker. When the agent generates code as root inside the container, you may not be able to modify these files on the host using your own account.

Solution: You need to explicitly specify user: “${UID}:${GID}” in Docker Compose; otherwise, the development experience will become very cumbersome.

I come from th pre-virtualization era, where we have bare metal installations remain up for a few years without a single reboot. So learning the pros and cons, potential security issues is important… I have corns set up to backup multiple directories to a remote server nightly, giving enough time to revert back a day, if needed.

Wouldn’t using the --sandbox agent flag in the agent give the same security?

That is a great question. While the --sandbox flag in OpenClaw is a strong first line of defense, it operates at a different “layer” than Docker.

The --sandbox flag: This is a Software-level restriction. It usually tells the agent’s internal code-executor (like a Python or Node.js runtime) to restrict its file-system access to a specific path. If there is a bug in the sandbox implementation or an “escape” vulnerability in the language runtime itself, the agent can still break out and see your host files.

Besides, the --sandbox flag doesn’t help with system bloat or dependency hell. If an agent installs 50 random npm packages or pip libraries to solve a task, it litters your host system with files. In Docker, you just delete the container and start fresh. Your nightly cron backups are perfect for your data, but Docker protects your environment from getting messy.

Understandable.

Would you mind sharing a redacted version of your openclaw.json and some of the commands used to set up your dockers? If you don’t want to share here, PM me. Would be happy to share what I’ve done as well.

Hi David

In “Claw_Setup” project at the Reference, I added the openclaw.json as a sample file. If the file path, port, file/model name, API key you setup correctly, it works well.