Jetson Orin Nano 8GB Docker issue Ollama falls back to CPU when Stable Diffusion is running

Hi everyone

I am using an NVIDIA Jetson Orin Nano 8GB Developer Kit. I run both Ollama and Stable Diffusion inside Docker containers using the NVIDIA runtime.

The problem is that Ollama often switches to CPU usage when Stable Diffusion is active. Since the Jetson has unified memory, Ollama seems to think there is not enough GPU memory left and moves the workload to the ARM cores. This makes the LLM extremely slow.

I want to force both to stay on the GPU.

Does anyone know how to set hard memory limits in Docker specifically for the Jetson so Stable Diffusion does not eat all the RAM? Also is there a way to force the Ollama Docker container to never use the CPU for inference even if memory is tight?

I am looking for the right environment variables or Docker Compose settings to manage this shared memory on Jetpack.

Thanks for any help

*** Please note that this reply is generated by LLM automatically ***

To address the issue of Ollama switching to CPU usage when Stable Diffusion is active on the NVIDIA Jetson Orin Nano 8GB Developer Kit, we need to explore a few options:

  1. Setting hard memory limits in Docker: You can set memory limits for Docker containers using the --memory or --memory-swap flags. However, these flags are not specific to the Jetson’s unified memory architecture.

    To set memory limits for a Docker container on the Jetson, you can use the --runtime flag with the nvidia runtime and specify the --memory or --memory-swap flags. For example:

docker run --runtime=nvidia --memory=4g --memory-swap=8g ollama


   This command sets the memory limit for the Ollama container to 4 GB and the swap limit to 8 GB.

2. **Forcing the Ollama Docker container to never use the CPU for inference**: To force the Ollama container to use the GPU for inference, you can set the `CUDA_VISIBLE_DEVICES` environment variable to the GPU device ID. For example:

   ```bash
docker run --runtime=nvidia -e CUDA_VISIBLE_DEVICES=0 ollama

This command sets the CUDA_VISIBLE_DEVICES environment variable to the GPU device ID 0, which forces the Ollama container to use the GPU for inference.

  1. Managing shared memory on Jetpack: To manage shared memory on Jetpack, you can use the nvidia-container-runtime with the --memory or --memory-swap flags. For example:

sudo nvidia-container-runtime --memory=4g --memory-swap=8g ollama


   This command sets the memory limit for the Ollama container to 4 GB and the swap limit to 8 GB.

In terms of Docker Compose settings, you can specify the memory limits and environment variables in the `docker-compose.yml` file. For example:

```yml
version: '3'
services:
  ollama:
    image: ollama
    runtime: nvidia
    environment:
      - CUDA_VISIBLE_DEVICES=0
    deploy:
      resources:
        limits:
          memory: 4g
          memory-swap: 8g

This Docker Compose file sets the memory limit for the Ollama service to 4 GB and the swap limit to 8 GB, and forces the Ollama container to use the GPU for inference by setting the CUDA_VISIBLE_DEVICES environment variable to the GPU device ID 0.

In comparison to other options, using the nvidia-container-runtime with the --memory or --memory-swap flags provides more fine-grained control over memory management on the Jetson. However, using Docker Compose settings provides a more convenient way to manage memory limits and environment variables for multiple services.

In terms of environment variables, you can set the CUDA_VISIBLE_DEVICES environment variable to force the Ollama container to use the GPU for inference. Additionally, you can set the NVIDIA_DRIVER_CAPABILITIES environment variable to specify the NVIDIA driver capabilities for the container. For example:

docker run --runtime=nvidia -e CUDA_VISIBLE_DEVICES=0 -e NVIDIA_DRIVER_CAPABILITIES=compute ollama

This command sets the CUDA_VISIBLE_DEVICES environment variable to the GPU device ID 0 and the NVIDIA_DRIVER_CAPABILITIES environment variable to compute, which forces the Ollama container to use the GPU for inference and specifies the NVIDIA driver capabilities for the container.

In summary, to manage shared memory on the Jetson and force the Ollama container to use the GPU for inference, you can use a combination of Docker flags, environment variables, and Docker Compose settings.

*** Please give the thumbs up if you get answers you like. Or provide feedback to help us improve the answer. ***

Hi,

Please check if the following tips help:

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.