Effective PyTorch and CUDA

fg121 · January 12, 2026, 2:50pm

I attached the docker file, I am pulling the latest nvidia torch but i am still getting issues with sm_121

FROM nvcr.io/nvidia/pytorch:25.12-py3

Just to be explicitRUN pip install --upgrade pip

Install your Python deps

RUN pip install 
torchao 
torchtune 
datasets 
bitsandbytes 
tensorboard 
matplotlib 
mlflow 
python-dotenv 
loguru 
nvidia-ml-py

Default workdir inside container (will be overridden by -w at runtime if you want)

WORKDIR /workspacePython 3.12.3

Pytorch Installed Version: 2.10.0a0+b4e4ee81d3.nv25.12
Pytorch CUDA Version: 13.1
Pytorch CUDA is available: True
Pytorch CUDA device count: 1
Pytorch NCCL Version: (2, 28, 9)

Traceback (most recent call last):
File “/workspace/torch_tune/src/full_test_dist.py”, line 55, in 
raise RuntimeError(
RuntimeError: Unsupported GPU for this PyTorch build. Detected device arch sm_121, but this PyTorch install only supports: compute_120, sm_100, sm_110, sm_120, sm_80, sm_86, sm_90. Install a PyTorch build that includes your GPU architecture (or use a different GPU).
E0112 14:33:34.279000 211 torch/distributed/elastic/multiprocessing/api.py:978] failed (exitcode: 1) local_rank: 0 (pid: 236) of binary: /usr/bin/python
Traceback (most recent call last):
File “/usr/local/bin/torchrun”, line 7, in 
sys.exit(main())
^^^^^^
File “/usr/local/lib/python3.12/dist-packages/torch/distributed/elastic/multiprocessing/errors/init.py”, line 362, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File “/usr/local/lib/python3.12/dist-packages/torch/distributed/run.py”, line 982, in main
run(args)
File “/usr/local/lib/python3.12/dist-packages/torch/distributed/run.py”, line 973, in run
elastic_launch(
File “/usr/local/lib/python3.12/dist-packages/torch/distributed/launcher/api.py”, line 165, in call
return launch_agent(self._config, self._entrypoint, list(args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/local/lib/python3.12/dist-packages/torch/distributed/launcher/api.py”, line 313, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

Topic		Replies	Views
Anyone got nanochat training working on the DGX spark? DGX Spark / GB10	10	1459	November 21, 2025
Can I downgrade DGX SPARK to cuda 12? DGX Spark / GB10	6	926	November 2, 2025
Which PyTorch base image to use in AI Workbench with DGX Spark? DGX Spark / GB10	13	858	November 18, 2025
Cloud Vendor agnostic Pytorch CUDA docker image Frameworks (archived)	0	684	February 26, 2023
Has Anyone Tried Getting ~1.5×+ Speedup on DGX/Spark/Grace/Blackwell using this? DGX Spark / GB10 cuda	20	776	January 2, 2026
Roadblock: DGX Spark. PyTorch 2.8 and NGC 25.09 missing support for GB10 GPUs with sm_121 for FLUX training: DGX Spark / GB10	2	233	November 17, 2025
Has anyone been able to get Ostris' AI Toolkit running on DGX Spark? DGX Spark / GB10	22	2561	December 19, 2025
Unable to Install CUDA-Enabled PyTorch for NVIDIA GB10 GPU (Only CPU Version Installed) CUDA Setup and Installation cuda , pytorch	4	585	December 7, 2025
Having trouble with my dgx spark-digits DGX Spark / GB10	3	113	December 5, 2025
vLLM requires CUDA-compatible PyTorch Linux	4	99	February 19, 2026

Effective PyTorch and CUDA

Related topics