Next Iteration: Full-Featured CUDA-X Data Science Environment on NVIDIA DGX with Heavyweight ML Libraries

griffith.mark · December 25, 2025, 1:29pm

Hello NVIDIA Developer Community,

If you’re just starting with GPU-accelerated data science on DGX, I strongly recommend running through the official CUDA-X Data Science playbook first ( CUDA-X Data Science | DGX Spark ). It shows the basics of zero-code-change acceleration with cuML, cuDF, UMAP, HDBSCAN, and pandas. Come back to this recipe once you’re ready for a richer stack that adds heavyweight probabilistic modeling and advanced time-series tools.

This next iteration extends the RAPIDS base with heavyweight machine learning and probabilistic modeling libraries, optimized for advanced GPU-accelerated ML workflows on ARM64 Grace Blackwell systems (e.g., DGX Spark or larger DGX stations). It includes PyMC for Bayesian modeling, HMM and probabilistic tools, backtesting frameworks for general time-series analysis, and enhanced monitoring utilities—perfect for researchers and data scientists moving into full-scale probabilistic ML, clustering, and large-scale simulations.

Note: This is a community-contributed recipe built on NVIDIA’s officially supported RAPIDS base images—it is not an officially supported NVIDIA image.

Prerequisites

NVIDIA DGX system (ARM64, Grace Blackwell architecture) with Docker and NVIDIA container toolkit installed.
Base image: rapidsai/base:26.02a-cuda12-py3.13 (the 26.02a tag is an example from recent RAPIDS releases; always check the latest tags on Docker Hub for the most current stable or nightly options that match your CUDA driver stack).

Step 1: Prepare Your requirements.txt

This includes heavyweight ML/probabilistic libs plus monitoring and utilities:

# Lightweight utilities & monitoring
aiohttp>=3.10.5
rich>=13.8.0
tqdm>=4.66.0
gpustat
psutil
nvitop>=1.3.2

# Backtesting frameworks (general time-series / strategy evaluation)
backtesting
vectorbt
backtrader

# Bayesian / probabilistic modeling (heavyweight ML)
pymc
hmmlearn
pomegranate

# Misc
typing-extensions>=4.12.0

These add powerful tools like Markov Chain Monte Carlo (PyMC), hidden Markov models, and probabilistic graphical models—ideal for uncertainty quantification, generative modeling, and advanced clustering on GPU.

⚠️ Note on Python 3.13 compatibility: Some packages (e.g., pomegranate) may have limited or delayed pre-built wheels for Python 3.13 on ARM64. If installation fails during the build, consider using a base image with Python 3.12 or pinning specific package versions.

Step 2: Build the Dockerfile

Extend the official RAPIDS base with Jupyter for exploration.

## RAPIDS base image for Grace Blackwell (ARM64) - CUDA 12.x + Python 3.13
FROM rapidsai/base:26.02a-cuda12-py3.13
SHELL ["/bin/bash", "-o", "pipefail", "-c"]
USER root
# System packages
RUN apt-get update && \
    DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
        ca-certificates \
        git \
        curl && \
    rm -rf /var/lib/apt/lists/*
# Copy and install Python extras
COPY requirements.txt /tmp/requirements.txt
RUN python -m pip install --no-cache-dir -r /tmp/requirements.txt && \
    python -m pip install --no-cache-dir nvitop && \
    python -m pip install --no-cache-dir jupyterlab jupyterlab-lsp ipywidgets && \
    conda clean --all -y
# Optional: Bake in a custom Python library (uncomment if you want it permanently included in the image)
# COPY my_custom_lib /opt/my_custom_lib
# ENV PYTHONPATH="/opt/my_custom_lib:${PYTHONPATH}"
CMD ["bash"]

Build Instructions

Save Dockerfile and requirements.txt in your build directory.
Build: docker build -t cuda_x_ds_full .
This image now includes full RAPIDS (cuDF, cuML, cuGraph, etc.) + heavyweight ML libs for end-to-end Bayesian/probabilistic workflows.

Step 3: Remote Execution Script for DGX Systems

For production batch runs on a headless DGX (no interactive notebook needed):

#!/bin/bash
# run_cuda_ds_script.sh — Launch advanced ML scripts on DGX (ideal for production batch jobs)

set -euo pipefail

DGX_HOST="192.168.5.100"  # Update to your DGX IP
HOST_PROJECT_ROOT="/home/youruser/project"  # Path to code on host
CONTAINER_PROJECT_ROOT="/project"
HOST_DATA_ROOT="/home/youruser/data"  # Path to datasets
CONTAINER_DATA_ROOT="/data"

working_dir="$1"
filename="$2"
shift 2 || true

echo "Launching on DGX: ${working_dir}/${filename}"

ssh -t youruser@"${DGX_HOST}" bash -s << EOF
echo "=== Launching container ==="
docker run --rm --init --gpus all \
  --privileged \
  --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 \
  -v ${HOST_PROJECT_ROOT}:${CONTAINER_PROJECT_ROOT} \
  -v ${HOST_DATA_ROOT}:${CONTAINER_DATA_ROOT} \
  -w ${CONTAINER_PROJECT_ROOT}/${working_dir} \
  cuda_x_ds_full \
  bash -c "
    echo '=== Running Python script ==='
    python ${filename} \$@
  "
EOF

Usage: ./run_cuda_ds_script.sh dir script.py [args]

Managing a headless DGX from macOS or Windows: This script is perfect for production workflows where you develop locally (on Mac/Windows) and execute remotely on the DGX via SSH—no Jupyter required. Sync code with rsync, git, or your preferred tool, then trigger batch runs.

For convenience on macOS, create a GUI launcher with Shortcuts app:

Open Shortcuts → New Shortcut.
Add “Get File” to select your .py script.
Extract folder and filename.
Run Shell Script calling ./run_cuda_ds_script.sh with those values.

On Windows, use a PowerShell GUI picker:

Add-Type -AssemblyName System.Windows.Forms
$dialog = New-Object System.Windows.Forms.OpenFileDialog
$dialog.Filter = "Python files (*.py)|*.py"
if ($dialog.ShowDialog() -eq "OK") {
    $dir = Split-Path $dialog.FileName
    $file = Split-Path $dialog.FileName -Leaf
    & .\run_cuda_ds_script.sh $dir $file $args
}

Key advantage of this workflow: Your latest code and custom libraries are volume-mounted from the host into the container at runtime—no need to rebuild the image for every code change or library addition. This enables true “lazy loading”: edit scripts or add/modify modules on the host, and they are immediately available inside the container on the next run. For custom libraries, simply add another mount and set PYTHONPATH as needed.

Why This is the Next Iteration

Starts from the official CUDA-X Data Science playbook’s zero-code-change accelerations.
Adds heavyweight probabilistic ML (PyMC for Bayesian inference at scale, pomegranate for PGMs).
Includes general backtesting frameworks for time-series validation.
Enhanced monitoring (nvitop, gpustat) for long-running Bayesian simulations.

Typical use cases (showing the broad applicability of this stack):

Large-scale customer segmentation & churn modeling: cuDF/cuML for feature engineering and HDBSCAN/UMAP clustering on hundreds of millions of transaction or clickstream rows; PyMC for probabilistic churn/uplift models with full posterior uncertainty.
Predictive maintenance on IoT/telemetry data: Multivariate sensor streams processed with cuDF; HMMs for regime detection and PyMC for Bayesian remaining-useful-life estimates.
Anomaly detection in logs or industrial processes: GPU-accelerated preprocessing and clustering (HDBSCAN/GMM via cuML), followed by PyMC for calibrated anomaly scoring with uncertainty.
Patient trajectory & healthcare modeling: Hidden Markov models for disease progression states; hierarchical Bayesian models (PyMC) for treatment effects across cohorts or hospitals.
Simulation-heavy operational risk & capacity planning: Monte Carlo simulations of queues, outages, or demand processes, with PyMC calibrating parameters and propagating uncertainty.

Why nvitop for monitoring? Host-level tools like DCGM and the DGX Dashboard are excellent for system health, but nvitop provides lightweight, interactive, process-level GPU visibility directly inside the container—ideal for debugging long-running PyMC sampling or cuML jobs.

Troubleshooting

If a package like pymc or pomegranate fails to install: Pin versions in requirements.txt or switch to a Python 3.12 base image.
For heavy PyMC usage, Python 3.11/3.12 RAPIDS images currently have the strongest ecosystem support.
If containers fail with CUDA errors, confirm that your host driver supports the CUDA version in the RAPIDS base image (see NVIDIA’s CUDA compatibility matrix).
GPU not detected: Verify host drivers match container CUDA version and use --gpus all.
Memory issues: Tune --shm-size or ulimits.

Tips

Monitor GPUs with nvitop during heavy sampling.
For multi-GPU: Use Dask-cuDF or cuML’s multi-node capabilities.
Extend with your own libs via mounts (development) or baked-in COPY (production).

This setup has powered my advanced ML experiments reliably. Give it a try for your next probabilistic project on DGX!

Feedback or variations welcome!

Best,
Mark

Topic		Replies	Views
Driving XGBoost in the GPU Fastlane Data Science of the Day fun-facts , dask , gpu , xgboost	0	1297	February 9, 2021
Unlocking Multi-GPU Model Training with Dask XGBoost Technical Blog	1	545	October 3, 2023
7 Drop-In Replacements to Instantly Speed Up Your Python Data Science Workflows Technical Blog	1	47	August 4, 2025
Available Now: NVIDIA AI Accelerated DGL and PyG Containers for GNNs Technical Blog	0	396	December 8, 2023
Accelerating XGBoost on GPU Clusters with Dask Technical Blog	0	451	June 17, 2021
Scaling XGBoost Performance with RAPIDS cuML, Dask, and Google Kubernetes Engine (GKE) Technical Blog	0	415	May 17, 2021
Has anyone been able to get Ostris' AI Toolkit running on DGX Spark? DGX Spark / GB10	22	1765	December 19, 2025
Running multiple ML algorithms(Xgboost,LGBM) concurrently in GPU CUDA Programming and Performance pycuda	1	998	October 15, 2020
DGX Spark New Playbooks - Nov 2025 Announcements	0	307	November 26, 2025
Available GPU access from NVIDIA DGX Cloud BioNeMo	0	820	September 6, 2023