Just use the latest NCG Pytorch container. It has all the required optimizations and you won’t experience any of the hassles. Also, much easier to distribute, maintain across a cluster of sparks.
I attached the docker file, I am pulling the latest nvidia torch but i am still getting issues with sm_121
FROM nvcr.io/nvidia/pytorch:25.12-py3
Just to be explicitRUN pip install --upgrade pip
Install your Python deps
RUN pip install
torchao
torchtune
datasets
bitsandbytes
tensorboard
matplotlib
mlflow
python-dotenv
loguru
nvidia-ml-py
Default workdir inside container (will be overridden by -w at runtime if you want)
WORKDIR /workspacePython 3.12.3
Pytorch Installed Version: 2.10.0a0+b4e4ee81d3.nv25.12
Pytorch CUDA Version: 13.1
Pytorch CUDA is available: True
Pytorch CUDA device count: 1
Pytorch NCCL Version: (2, 28, 9)
Traceback (most recent call last):
File “/workspace/torch_tune/src/full_test_dist.py”, line 55, in
raise RuntimeError(
RuntimeError: Unsupported GPU for this PyTorch build. Detected device arch sm_121, but this PyTorch install only supports: compute_120, sm_100, sm_110, sm_120, sm_80, sm_86, sm_90. Install a PyTorch build that includes your GPU architecture (or use a different GPU).
E0112 14:33:34.279000 211 torch/distributed/elastic/multiprocessing/api.py:978] failed (exitcode: 1) local_rank: 0 (pid: 236) of binary: /usr/bin/python
Traceback (most recent call last):
File “/usr/local/bin/torchrun”, line 7, in
sys.exit(main())
^^^^^^
File “/usr/local/lib/python3.12/dist-packages/torch/distributed/elastic/multiprocessing/errors/init.py”, line 362, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File “/usr/local/lib/python3.12/dist-packages/torch/distributed/run.py”, line 982, in main
run(args)
File “/usr/local/lib/python3.12/dist-packages/torch/distributed/run.py”, line 973, in run
elastic_launch(
File “/usr/local/lib/python3.12/dist-packages/torch/distributed/launcher/api.py”, line 165, in call
return launch_agent(self._config, self._entrypoint, list(args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/local/lib/python3.12/dist-packages/torch/distributed/launcher/api.py”, line 313, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
1 Like
@fg121 the December 2025 container is build with PyTorch Version 2.10.0a0+b4e4ee8.
However there’s no sm_121 support, not yet:
>>> import torch
>>> print(torch.cuda.get_arch_list())
['sm_80', 'sm_86', 'sm_90', 'sm_100', 'sm_110', 'sm_120', 'compute_120']
@elsaco are you aware on when it would be supported?