Effective PyTorch and CUDA

Just use the latest NCG Pytorch container. It has all the required optimizations and you won’t experience any of the hassles. Also, much easier to distribute, maintain across a cluster of sparks.

I attached the docker file, I am pulling the latest nvidia torch but i am still getting issues with sm_121

FROM nvcr.io/nvidia/pytorch:25.12-py3

Just to be explicitRUN pip install --upgrade pip

Install your Python deps

RUN pip install 
torchao 
torchtune 
datasets 
bitsandbytes 
tensorboard 
matplotlib 
mlflow 
python-dotenv 
loguru 
nvidia-ml-py

Default workdir inside container (will be overridden by -w at runtime if you want)

WORKDIR /workspacePython 3.12.3

Pytorch Installed Version: 2.10.0a0+b4e4ee81d3.nv25.12
Pytorch CUDA Version: 13.1
Pytorch CUDA is available: True
Pytorch CUDA device count: 1
Pytorch NCCL Version: (2, 28, 9)

Traceback (most recent call last):
File “/workspace/torch_tune/src/full_test_dist.py”, line 55, in 
raise RuntimeError(
RuntimeError: Unsupported GPU for this PyTorch build. Detected device arch sm_121, but this PyTorch install only supports: compute_120, sm_100, sm_110, sm_120, sm_80, sm_86, sm_90. Install a PyTorch build that includes your GPU architecture (or use a different GPU).
E0112 14:33:34.279000 211 torch/distributed/elastic/multiprocessing/api.py:978] failed (exitcode: 1) local_rank: 0 (pid: 236) of binary: /usr/bin/python
Traceback (most recent call last):
File “/usr/local/bin/torchrun”, line 7, in 
sys.exit(main())
^^^^^^
File “/usr/local/lib/python3.12/dist-packages/torch/distributed/elastic/multiprocessing/errors/init.py”, line 362, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File “/usr/local/lib/python3.12/dist-packages/torch/distributed/run.py”, line 982, in main
run(args)
File “/usr/local/lib/python3.12/dist-packages/torch/distributed/run.py”, line 973, in run
elastic_launch(
File “/usr/local/lib/python3.12/dist-packages/torch/distributed/launcher/api.py”, line 165, in call
return launch_agent(self._config, self._entrypoint, list(args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/local/lib/python3.12/dist-packages/torch/distributed/launcher/api.py”, line 313, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
1 Like

@fg121 the December 2025 container is build with PyTorch Version 2.10.0a0+b4e4ee8.

However there’s no sm_121 support, not yet:

>>> import torch
>>> print(torch.cuda.get_arch_list())
['sm_80', 'sm_86', 'sm_90', 'sm_100', 'sm_110', 'sm_120', 'compute_120']

@elsaco are you aware on when it would be supported?