Hey guys,
I cam across a good alternative to google photos / icloud photos.
the nice thing is that the immich app can be “split” in different parts
Main Server The main immich app can run on my Intel NUC thats where the App and the Database runs. (hosted in a Docker Container on a proxmox LXC Container on my Intel NUC)
The Machine Learning Server that installation did not work out of the box cause of Linux/ARM64/Cuda13 system of the DGX Spark. It is hosted in a Docker Container on the DGX Spark - See this discussion on github
The Image and Video Files (hosted on my Synology NAS - connected as a mounted volume to the NUC) I host my pictures on a Synology NAS and I do not want to move them to the Spark cause the NAS has more storage and I need the SSD for LLMs.
The iOS / Android app | iOS | Android
Everything from here is AI generated - Took a long time to figure it out and I wanted to share it with you guys - and also with my Future me when I come back and look for the Tutorial.
Immich Server Stack — Intel NUC (Proxmox LXC / Portainer)
Immich server stack running in a Debian LXC container on Proxmox, managed via Portainer.
Machine learning is offloaded to a separate NVIDIA DGX Spark
via MACHINE_LEARNING_URL.
Architecture
┌─────────────────────────────────────┐ ┌──────────────────────────────┐
│ Intel NUC (Proxmox / Debian LXC) │ │ NVIDIA DGX Spark (ARM64) │
│ │ │ │
│ immich-server :2283 │────▶│ immich-machine-learning │
│ immich-db (postgres) │ │ :2284 (CUDA, GB10 GPU) │
│ immich-redis │ └──────────────────────────────┘
│ │
│ /mnt/Diskstation-* (Synology NAS) │
└─────────────────────────────────────┘
Photos are served from a Synology NAS via bind mounts into the LXC container.
Prerequisites
Step 1 — Enable NFS on the Synology NAS
- Control Panel → File Services → NFS → enable NFS service (NFSv4.1 recommended)
- For each shared folder to expose:
- Control Panel → Shared Folder → select folder → Edit → NFS Permissions → Create
- Set Allowed IP / CIDR to your Proxmox host IP (e.g.
192.168.178.10) or subnet (192.168.178.0/24) - Privilege: Read/Write
- Squash: No mapping (or Map root to admin if you have permission issues)
- Note the Mount path shown — it will be something like
/volume1/immich
Step 2 — Mount NAS shares on the Proxmox host
Install NFS client on the Proxmox host (not inside the LXC):
apt-get install -y nfs-common
Create local mount points:
mkdir -p /mnt/Diskstation-immich
mkdir -p /mnt/Diskstation-Photos-USER_01
# ... repeat for each share
Add entries to /etc/fstab on the Proxmox host — use _netdev so the mounts wait for the
network to be up before mounting on boot:
# Synology NAS mounts — replace 192.168.178.x with your NAS IP
192.168.178.x:/volume1/immich /mnt/Diskstation-immich nfs defaults,_netdev,nfsvers=4.1,soft,timeo=30 0 0
192.168.178.x:/volume1/Photos-USER_01 /mnt/Diskstation-Photos-USER_01 nfs defaults,_netdev,nfsvers=4.1,soft,timeo=30 0 0
# ... repeat for each share
Test the mounts without rebooting:
mount -a
df -h | grep Diskstation
Tip —
softvshardmounts:softcauses operations to fail gracefully if the NAS
is unreachable (avoids hanging processes).hardretries forever. For a photo library
softis safer; for the upload volume you may preferhardto avoid data loss on
a brief network hiccup.
Step 3 — Pass mounts into the LXC container
Add bind mount entries to the LXC container config on Proxmox
(/etc/pve/lxc/<container-id>.conf):
mp0: /mnt/Diskstation-immich,mp=/mnt/Diskstation-immich
mp1: /mnt/Diskstation-Photos-USER_01,mp=/mnt/Diskstation-Photos-USER_01
mp2: /mnt/Diskstation-Photos-USER_02,mp=/mnt/Diskstation-Photos-USER_02
mp3: /mnt/Diskstation-Photos-USER_03,mp=/mnt/Diskstation-Photos-USER_03
mp4: /mnt/Diskstation-Photos-USER_04,mp=/mnt/Diskstation-Photos-USER_04
Restart the LXC container to apply the new mount points:
pct stop <container-id> && pct start <container-id>
Verify inside the container:
ls /mnt/Diskstation-immich
Boot order gotcha: The NFS mounts are on the Proxmox host, not inside the LXC.
If Proxmox boots and starts the LXC before the NFS mounts are ready, the bind mounts
will be empty directories. Userestart: unless-stopped(noton-failure) in
docker-compose so containers restart automatically once the mounts appear.
Step 4 — Create required Immich upload subdirectories
Run once on the Proxmox host (or inside the LXC) after the NAS is mounted:
Create required upload subdirectories on the NAS on first run:
mkdir -p /mnt/Diskstation-immich/upload/{encoded-video,thumbs,backups,library,profile}
touch /mnt/Diskstation-immich/upload/encoded-video/.immich
touch /mnt/Diskstation-immich/upload/thumbs/.immich
touch /mnt/Diskstation-immich/upload/backups/.immich
touch /mnt/Diskstation-immich/upload/library/.immich
touch /mnt/Diskstation-immich/upload/profile/.immich
Step 5 — Machine learning URL
After deploying, set the ML URL in Immich admin UI:
Administration → System Settings → Machine Learning → URL
→ http://<dgx-spark-ip>:2284
(The env var MACHINE_LEARNING_URL sets the default, but the admin UI value takes precedence
once the server has started for the first time.)
docker-compose.yml
services:
immich-redis:
image: redis:7-alpine
container_name: immich-redis
hostname: immich-redis
security_opt:
- no-new-privileges:true
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 5s
retries: 5
volumes:
- immich-redis:/data
restart: unless-stopped
immich-db:
image: ghcr.io/immich-app/postgres:16-vectorchord0.4.3-pgvectors0.2.0
container_name: immich-db
hostname: immich-db
security_opt:
- no-new-privileges:true
healthcheck:
test: ["CMD", "pg_isready", "-q", "-d", "immich", "-U", "immich-db-user"]
interval: 10s
timeout: 5s
retries: 5
shm_size: 128mb
volumes:
- immich-db:/var/lib/postgresql/data
environment:
POSTGRES_DB: immich
POSTGRES_USER: immich-db-user
POSTGRES_PASSWORD: changeme # ← set a strong password
DB_STORAGE_TYPE: SSD
restart: unless-stopped
immich-server:
image: ghcr.io/immich-app/immich-server:release
container_name: immich-server
hostname: immich-server
security_opt:
- no-new-privileges:true
ports:
- "2283:2283"
volumes:
- /mnt/Diskstation-immich:/usr/src/app/upload:rw
# External photo libraries (Synology NAS bind mounts):
- /mnt/Diskstation-Photos-USER_01:/usr/src/app/Diskstation-Photos-USER_01:rw
- /mnt/Diskstation-Photos-USER_02:/usr/src/app/Diskstation-Photos-USER_02:rw
- /mnt/Diskstation-Photos-USER_03:/usr/src/app/Diskstation-Photos-USER_03:rw
- /mnt/Diskstation-Photos-USER_04:/usr/src/app/Diskstation-Photos-USER_04:rw
environment:
IMMICH_LOG_LEVEL: log
DB_HOSTNAME: immich-db
DB_PORT: '5432'
DB_DATABASE_NAME: immich
DB_USERNAME: immich-db-user
DB_PASSWORD: changeme # ← must match POSTGRES_PASSWORD above
REDIS_HOSTNAME: immich-redis
MACHINE_LEARNING_URL: http://192.168.178.8:2284 # ← DGX Spark IP
restart: unless-stopped
depends_on:
immich-db:
condition: service_healthy
immich-redis:
condition: service_healthy
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:2283/api/server/ping"]
interval: 10s
timeout: 5s
retries: 5
start_period: 60s
volumes:
immich-db:
immich-redis:
External Libraries
For each NAS photo folder, add an External Library in Immich:
Administration → External Libraries → Create Library
Recommended exclusion patterns (Synology NAS):
| Pattern | Reason |
|---|---|
**/@eaDir/** |
Synology extended attributes temp dir |
**/._.* |
macOS metadata files |
**/#recycle/** |
Synology recycle bin |
**/#snapshot/** |
Synology snapshots |
**/.stversions/** |
Syncthing versions |
**/.stfolder/** |
Syncthing folder marker |
*.bat |
Windows batch files |
Thumbs.db |
Windows thumbnail cache |
Notes
restart: unless-stoppedis required on all services.on-failurewill not restart
after a clean exit (e.g. on host reboot when NAS mounts aren’t ready yet), causing
ENOTFOUND immich-dberrors on next boot.- Immich v2.7.5+ uses internal port
2283(previously3001). The healthcheck URL
must match:http://localhost:2283/api/server/ping. - The
MACHINE_LEARNING_URLenv var sets the default on first startup. If you change it
later, update it in the admin UI — the database-stored value takes precedence over the
env var after first boot.
Tested on: Intel NUC, Proxmox 8, Debian LXC, Immich v2.7.5
GPU Acceleration for NVIDIA DGX Spark GB10 on Immich (ARM64, Blackwell, CUDA 13.0)
Working fix for
CUDAExecutionProvidernot registering on the NVIDIA DGX Spark GB10 with ORT 1.24.4.
Based on the original work by @volschin.
Hardware & Environment
| Device | NVIDIA DGX Spark |
| GPU | GB10 (Blackwell, SM_121, compute capability 12.1) |
| Architecture | ARM64 (aarch64) |
| CUDA | 13.0 |
| Driver | 580.126.09 |
| OS | Ubuntu 24.04 |
| ORT built | v1.24.4 |
Symptom
Running immich-machine-learning with DEVICE=cuda — all models silently fall back to CPU:
INFO Setting execution providers to ['CPUExecutionProvider'], in descending order of preference
ort.get_available_providers() returns only ['AzureExecutionProvider', 'CPUExecutionProvider']
even though ort.get_all_providers() lists CUDAExecutionProvider as compiled in.
The ORT warning logged at startup:
[W:onnxruntime:Default, device_discovery.cc:211 DiscoverDevicesForPlatform]
GPU device discovery failed: device_discovery.cc:91 ReadFileContents
Failed to open file: "/sys/class/drm/card0/device/vendor"
Root Cause
Issue 1 — ORT 1.24.4 DRM sysfs device discovery
ORT 1.24.4 introduced hardware device discovery via DRM sysfs in
onnxruntime/core/platform/linux/device_discovery.cc.
GetGpuDevices() scans /sys/class/drm/cardN/device/vendor to build a hardware device list.
The CUDA EP only registers itself as available if it finds a device with vendor_id == 0x10de in that list.
The DGX Spark GB10 is a SoC/platform GPU — not a PCIe card.
DRM entries exist in /sys/class/drm/ (card0, card1, renderD128), but the device/vendor file
is not created because there is no PCI vendor ID for an SoC-integrated GPU.
The original code used ORT_RETURN_IF_ERROR(GetGpuDeviceFromSysfs(...)) inside the DRM loop.
When reading the vendor file failed, it immediately aborted GetGpuDevices() — the existing
PCI bus fallback (/sys/bus/pci/devices/) was never reached.
The GB10 also has no PCI bus entry (confirmed: find /sys/bus/pci/devices/ -name "vendor" -exec grep -il "10de" {} \; returns nothing), so the PCI fallback would also find nothing even if reached.
Issue 2 — Dual ORT installation shadowing
When uv sync --extra cpu is followed by uv pip install --reinstall onnxruntime_gpu-*.whl:
uv sync --extra cpuinstallsonnxruntime(CPU package), including
libonnxruntime.so.1.24.1andonnxruntime_pybind11_state.cpython-311-aarch64-linux-gnu.soonnxruntimeandonnxruntime-gpuare different package names in pip/uv —
reinstalling one does not remove the other’s files- Python’s import system prefers the ABI-tagged
.cpython-311-aarch64-linux-gnu.soextension
over a plain.so, so the old unpatched CPU binary is loaded at runtime regardless of the
GPU wheel being present
Fix
Three changes to Dockerfile.dgx-spark:
Patch 1 & 2 — device_discovery.cc
Applied via a Python RUN step before ./build.sh:
- Fix 1 — Skip DRM cards with missing vendor files (
continue) instead of aborting
(ORT_RETURN_IF_ERROR). This allowsgpu_devicesto remain empty and reach the PCI fallback. - Fix 2 — After PCI fallback also finds nothing, check for
/dev/nvidia0
(present whenNVIDIA_VISIBLE_DEVICES=allis set via the NVIDIA Container Runtime)
and inject a syntheticOrtHardwareDevice{vendor_id=0x10de, type=GPU}so CUDA EP can register.
Patch 3 — Stage 2 wheel installation
Explicitly uninstall the CPU onnxruntime package before installing the GPU wheel,
so no old .so files shadow the new ones.
Result
[W] device_discovery.cc:283 GetGpuDevices] Skipping DRM card (no sysfs vendor info): ...
INFO Loading detection model 'buffalo_l' to memory
INFO Setting execution providers to ['CUDAExecutionProvider', 'CPUExecutionProvider'],
in descending order of preference
nvidia-smi shows the ML workers using ~9 GB of GPU memory for the loaded models.
Files
Dockerfile.dgx-spark
# Dockerfile for DGX Spark (ARM64 + NVIDIA Blackwell GB10, CUDA 13.0)
# Builds onnxruntime-gpu from source since no arm64 PyPI wheel exists.
#
# Based on: https://github.com/volschin/immich/commit/fab7df3371d522f12a4b780b3c2b837f341b88bb
# Additional fixes for ORT 1.24.4 device_discovery.cc (SoC GPU, no PCI vendor in sysfs)
# and dual-package shadowing (onnxruntime CPU wheel overriding GPU wheel at runtime).
# See: https://github.com/immich-app/immich/discussions/10647
# ---------------------------------------------------------------------------
# Stage 1: Build onnxruntime-gpu from source for ARM64 + Blackwell (SM_121)
# ---------------------------------------------------------------------------
FROM nvidia/cuda:13.0.2-cudnn-devel-ubuntu24.04 AS builder-ort
# renovate: datasource=github-tags depName=microsoft/onnxruntime
ARG ORT_VERSION="v1.24.4"
# Ubuntu 24.04 ships Python 3.12; install 3.11 from deadsnakes PPA
RUN apt-get update && apt-get install -y --no-install-recommends \
software-properties-common && \
add-apt-repository -y ppa:deadsnakes/ppa && \
apt-get update && apt-get install -y --no-install-recommends \
python3.11 python3.11-dev python3.11-venv python3-pip \
cmake git g++ && \
rm -rf /var/lib/apt/lists/* && \
update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.11 1
# ORT build needs numpy for Python::NumPy CMake target
RUN python3 -m pip install --break-system-packages --ignore-installed numpy packaging wheel setuptools
RUN git clone --depth 1 --branch ${ORT_VERSION} --recurse-submodules --shallow-submodules \
https://github.com/microsoft/onnxruntime.git /onnxruntime
WORKDIR /onnxruntime
# Patch: Remove 120 from ARCHITECTURES_WITH_ACCEL to avoid building sm_120a
RUN sed -i 's/set(ARCHITECTURES_WITH_ACCEL "90" "100" "101" "120")/set(ARCHITECTURES_WITH_ACCEL "100" "101" "122")/' \
cmake/CMakeLists.txt && \
grep -r 'ARCHITECTURES_WITH_ACCEL' cmake/ | head -5
# Patch device_discovery.cc for DGX Spark GB10 (SoC GPU, no PCI vendor in sysfs).
#
# Problem: ORT 1.24.4 introduced hardware device discovery via DRM sysfs
# (/sys/class/drm/cardN/device/vendor). GetGpuDevices() calls
# GetGpuDeviceFromSysfs() per DRM card, which reads device/vendor.
# The GB10 is a SoC/platform GPU — DRM entries exist but no PCI vendor file
# is created. ORT_RETURN_IF_ERROR aborts GetGpuDevices() immediately, so the
# existing PCI fallback and any further logic is never reached and CUDA EP
# stays unregistered (get_available_providers() returns only CPU).
#
# Fix 1: Skip DRM cards with missing sysfs vendor files (continue instead of
# abort). This lets gpu_devices stay empty so the PCI fallback runs.
# Fix 2: After PCI fallback also finds nothing (GB10 has no PCI bus entry),
# synthesize an NVIDIA GPU device when /dev/nvidia0 exists.
# vendor_id=0x10de + type=GPU is sufficient for CUDA EP to register.
RUN python3 - << 'PYEOF'
import sys
path = "onnxruntime/core/platform/linux/device_discovery.cc"
with open(path) as f:
src = f.read()
# Fix 1: skip DRM cards that have no sysfs vendor file
old1 = (
" for (const auto& gpu_sysfs_path_info : gpu_sysfs_path_infos) {\n"
" OrtHardwareDevice gpu_device{};\n"
" ORT_RETURN_IF_ERROR(GetGpuDeviceFromSysfs(gpu_sysfs_path_info, gpu_device));\n"
" gpu_devices.emplace_back(std::move(gpu_device));\n"
" }"
)
new1 = (
" for (const auto& gpu_sysfs_path_info : gpu_sysfs_path_infos) {\n"
" OrtHardwareDevice gpu_device{};\n"
" auto sysfs_status = GetGpuDeviceFromSysfs(gpu_sysfs_path_info, gpu_device);\n"
" if (!sysfs_status.IsOK()) {\n"
" LOGS_DEFAULT(WARNING) << \"Skipping DRM card (no sysfs vendor info): \"\n"
" << sysfs_status.ErrorMessage();\n"
" continue;\n"
" }\n"
" gpu_devices.emplace_back(std::move(gpu_device));\n"
" }"
)
if old1 not in src:
print("ERROR: Fix-1 pattern not found — check ORT version", file=sys.stderr)
sys.exit(1)
src = src.replace(old1, new1, 1)
print("Fix 1 applied: DRM skip-on-error")
# Fix 2: /dev/nvidia0 fallback after PCI scan also returns empty
nvidia_fallback = (
"\n"
" // Fallback for SoC/platform GPUs (e.g. DGX Spark GB10) that have neither\n"
" // DRM sysfs vendor entries nor a PCI bus representation.\n"
" // If /dev/nvidia0 exists the NVIDIA kernel driver is present; synthesize a\n"
" // minimal OrtHardwareDevice so the CUDA EP can register itself.\n"
" if (gpu_devices.empty()) {\n"
" std::error_code _ec{};\n"
" if (fs::exists(\"/dev/nvidia0\", _ec)) {\n"
" LOGS_DEFAULT(WARNING) << \"/dev/nvidia0 found but no GPU in sysfs/PCI — \"\n"
" << \"adding synthetic NVIDIA GPU device for SoC GPU.\";\n"
" OrtHardwareDevice nvidia_gpu{};\n"
" nvidia_gpu.vendor_id = 0x10de;\n"
" nvidia_gpu.type = OrtHardwareDeviceType_GPU;\n"
" gpu_devices.emplace_back(std::move(nvidia_gpu));\n"
" }\n"
" }\n"
)
anchor = " gpu_devices_out = std::move(gpu_devices);\n return Status::OK();\n}"
if anchor not in src:
print("ERROR: Fix-2 anchor not found — check ORT version", file=sys.stderr)
sys.exit(1)
src = src.replace(anchor, nvidia_fallback + anchor, 1)
print("Fix 2 applied: /dev/nvidia0 synthetic device fallback")
with open(path, "w") as f:
f.write(src)
print("device_discovery.cc patched successfully")
PYEOF
RUN ./build.sh \
--config Release \
--build_wheel \
--allow_running_as_root \
--use_cuda \
--cuda_home /usr/local/cuda \
--cudnn_home /usr \
--cuda_version 13.0 \
--parallel \
--cmake_extra_defines \
CMAKE_CUDA_ARCHITECTURES=121 \
onnxruntime_USE_FLASH_ATTENTION=OFF \
--skip_tests
RUN mkdir /ort-wheel && cp build/Linux/Release/dist/onnxruntime_gpu-*.whl /ort-wheel/
# ---------------------------------------------------------------------------
# Stage 2: Install Immich ML Python deps + custom ORT wheel
# ---------------------------------------------------------------------------
FROM python:3.11-bookworm AS builder
ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1 \
VIRTUAL_ENV=/opt/venv
RUN apt-get update && apt-get install -y --no-install-recommends g++ && \
rm -rf /var/lib/apt/lists/*
COPY --from=ghcr.io/astral-sh/uv:0.8.15@sha256:a5727064a0de127bdb7c9d3c1383f3a9ac307d9f2d8a391edc7896c54289ced0 /uv /uvx /bin/
RUN --mount=type=cache,target=/root/.cache/uv \
--mount=type=bind,source=uv.lock,target=uv.lock \
--mount=type=bind,source=pyproject.toml,target=pyproject.toml \
uv sync --frozen --extra cpu --no-dev --no-editable --no-install-project --compile-bytecode --no-progress --active --link-mode copy
COPY --from=builder-ort /ort-wheel /ort-wheel
# Uninstall the CPU onnxruntime installed by uv sync before putting in our
# custom GPU wheel. onnxruntime and onnxruntime-gpu are different package
# names — --reinstall alone leaves the CPU .so files in place, and Python
# loads the old unpatched cpython extension instead of the GPU one.
RUN uv pip uninstall onnxruntime onnxruntime-gpu 2>/dev/null || true && \
uv pip install --no-deps /ort-wheel/onnxruntime_gpu-*.whl && \
rm -rf /ort-wheel
# ---------------------------------------------------------------------------
# Stage 3: Minimal production image
# ---------------------------------------------------------------------------
FROM nvidia/cuda:13.0.2-cudnn-runtime-ubuntu24.04 AS prod
COPY --from=builder /usr/local/bin/python3 /usr/local/bin/python3
COPY --from=builder /usr/local/bin/python3.11 /usr/local/bin/python3.11
COPY --from=builder /usr/local/lib/python3.11 /usr/local/lib/python3.11
COPY --from=builder /usr/local/lib/libpython3.11.so* /usr/local/lib/
RUN ldconfig
ENV LD_PRELOAD=/usr/lib/libmimalloc.so.2 \
MACHINE_LEARNING_MODEL_ARENA=false
RUN apt-get update && \
apt-get install -y --no-install-recommends tini ccache libgl1 libglib2.0-0 libgomp1 libmimalloc2.0 && \
apt-get autoremove -yqq && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
RUN ln -s "/usr/lib/$(arch)-linux-gnu/libmimalloc.so.2" /usr/lib/libmimalloc.so.2
WORKDIR /usr/src
ENV TRANSFORMERS_CACHE=/cache \
PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1 \
PATH="/opt/venv/bin:$PATH" \
PYTHONPATH=/usr/src \
DEVICE=cuda \
VIRTUAL_ENV=/opt/venv \
MACHINE_LEARNING_CACHE_FOLDER=/cache
RUN echo "hard core 0" >> /etc/security/limits.conf && \
echo "fs.suid_dumpable 0" >> /etc/sysctl.conf && \
echo 'ulimit -S -c 0 > /dev/null 2>&1' >> /etc/profile
COPY --from=builder /opt/venv /opt/venv
COPY scripts/healthcheck.py .
COPY immich_ml immich_ml
ENTRYPOINT ["tini", "--"]
CMD ["python", "-m", "immich_ml"]
HEALTHCHECK CMD python3 healthcheck.py
docker-compose.yml (ML service on the DGX Spark)
# Immich machine-learning — DGX Spark (ARM64, GB10, CUDA 13.0)
#
# Prerequisites:
# 1. Build the image (run from immich/machine-learning/):
# docker build -f Dockerfile.dgx-spark -t immich-ml-dgx-spark:latest .
# First build ~60-90 min (compiles ORT from source).
# Subsequent rebuilds use cached Stage 1 and take ~2 min.
#
# 2. NVIDIA Container Toolkit installed and configured as default runtime:
# sudo nvidia-ctk runtime configure --runtime=docker --set-as-default
# sudo systemctl restart docker
#
# 3. On the immich-server, set:
# MACHINE_LEARNING_URL: http://<dgx-spark-ip>:2284
services:
immich-machine-learning:
image: immich-ml-dgx-spark:latest
runtime: nvidia
container_name: immich-machine-learning
hostname: immich-machine-learning
ports:
- "2284:3003"
security_opt:
- no-new-privileges:true
volumes:
- immich-ml-cache:/cache
- immich-ml-config:/.config
environment:
NVIDIA_VISIBLE_DEVICES: all
NVIDIA_DRIVER_CAPABILITIES: compute,utility
MPLCONFIGDIR: /tmp/matplotlib
DEVICE: cuda
MACHINE_LEARNING_WORKERS: 1
MACHINE_LEARNING_WORKER_THREADS: 4
MACHINE_LEARNING_LOG_LEVEL: info
MACHINE_LEARNING_DEVICE_IDS: "0"
restart: unless-stopped
volumes:
immich-ml-cache:
immich-ml-config:
Upstream Fix
The two device_discovery.cc patches apply cleanly to ORT v1.24.4 — the pattern strings
are exact matches. They would need verification against other ORT versions.
The underlying ORT issue (no fallback for SoC/platform GPUs that lack DRM/PCI vendor entries)
should ideally be fixed upstream in
onnxruntime/core/platform/linux/device_discovery.cc.
The fix is straightforward: skip DRM cards that have no vendor file instead of aborting, and
add a CUDA runtime fallback for platform GPUs.
Tested on: NVIDIA DGX Spark GB10, Ubuntu 24.04 ARM64, CUDA 13.0, ORT v1.24.4, Immich v2.7.5
