Thanks for helping. I did the following things, and it’s working for me. I have drafted a full-fledged document to make it work. Please find below my inputs.
Background: The GB10 (Blackwell architecture) isn’t officially supported in standard PyTorch containers yet, so I had to build a custom environment from scratch using NVIDIA’s CUDA base container.
Here’s what I did:
Initial Setup
Since the standard PyTorch containers weren’t recognizing our GPU, I started with NVIDIA’s basic CUDA container:
sudo docker run --gpus all -it \
--name gb10-pytorch \
--ipc=host \
--ulimit memlock=-1 \
--ulimit stack=67108864 \
-v "$PWD":/workspace \
nvcr.io/nvidia/cuda:12.9.0-devel-ubuntu22.04
Building the Environment
Once inside the container, I installed everything manually
# Basic Python setup
apt update
apt install -y python3 python3-pip git build-essential cmake ninja-build
# PyTorch nightly build (has CUDA 12.9 support for newer GPUs)
pip3 install --pre torch torchvision torchaudio \
--index-url https://download.pytorch.org/whl/nightly/cu129
# Core ML libraries
pip3 install numpy scipy pandas matplotlib seaborn scikit-learn
# Transformers stack for LLM work
pip3 install transformers datasets accelerate peft trl bitsandbytes sentencepiece protobuf
# Unsloth for efficient training
pip3 install unsloth
I verified everything was working with:
python3 -c "import torch; print('CUDA available:', torch.cuda.is_available()); print('GPU:', torch.cuda.get_device_name(0))"
Making it Persistent
After getting everything working, I needed to save this setup so we don’t have to rebuild every time:
# Exit container first
exit
# Save the container as an image
sudo docker commit gb10-pytorch my-gb10-ml:latest
# Remove the temporary container
sudo docker stop gb10-pytorch
sudo docker rm gb10-pytorch
# Create a persistent container that auto-starts on reboot
sudo docker run -d \
--name gb10-pytorch \
--restart unless-stopped \
--gpus all \
--ipc=host \
--ulimit memlock=-1 \
--ulimit stack=67108864 \
-v "$PWD":/workspace \
my-gb10-ml:latest \
sleep infinity
Important Notes:
-
The container now runs in the background.
-
The ~/home/project folder is mounted to /workspace inside the container - this is where all our project files are accessible
-
It will auto-restart after server reboots
Daily Usage
I’ve set up some aliases to make things easier. Add these to your ~/.bashrc:
alias ml-env='sudo docker exec -it gb10-pytorch bash'
alias ml-python='sudo docker exec -it gb10-pytorch python3'
alias ml-pip='sudo docker exec -it gb10-pytorch pip3'
Then just run source ~/.bashrc to activate them.
Now you can:
-
Jump into the environment: ml-env
-
Run Python directly: ml-python script.py
-
Install packages: ml-pip install package-name
Updating the Image
Whenever you install new packages and want to save them:
# 1. Install whatever you need
ml-env
pip3 install new-package
exit
# 2. Save the updated state
sudo docker commit gb10-pytorch my-gb10-ml:latest
# 3. Restart container with updated image
sudo docker stop gb10-pytorch
sudo docker rm gb10-pytorch
sudo docker run -d \
--name gb10-pytorch \
--restart unless-stopped \
--gpus all \
--ipc=host \
--ulimit memlock=-1 \
--ulimit stack=67108864 \
-v "$PWD":/workspace \
my-gb10-ml:latest \
sleep infinity
Current Status
The container is now running and ready to use. You can check it with:
sudo docker ps
You should see gb10-pytorch.