Faulty unsloth instruction/playbook?

I am following this manual:

but encounter several issues.

1st:
curl -O dgx-spark-playbooks/nvidia/unsloth/assets/test_unsloth.py at main · NVIDIA/dgx-spark-playbooks · GitHub → gives me the html-page but not the raw py-file.

curl -O https://raw.githubusercontent.com/NVIDIA/dgx-spark-playbooks/main/nvidia/unsloth/assets/test_unsloth.py – > works fine

2nd:
when I specify an online model from the list (unsloth/Llama-3.2-1B-bnb-4bit) in test_unsloth.py, I get the following error:

/usr/local/lib/python3.12/dist-packages/torch/cuda/init.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
import pynvml # type: ignore[import]
殮 Unsloth: Will patch your computer to enable 2x faster free finetuning.
殮 Unsloth Zoo will now patch everything to make training faster!
Traceback (most recent call last):
File “/workspace/test_unsloth.py”, line 48, in
model, tokenizer = FastModel.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/local/lib/python3.12/dist-packages/unsloth/models/loader.py”, line 731, in from_pretrained
model_types = get_transformers_model_type(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/local/lib/python3.12/dist-packages/unsloth_zoo/hf_utils.py”, line 112, in get_transformers_model_type
raise RuntimeError(
RuntimeError: Unsloth: No config file found - are you sure the model_name is correct?
If you’re using a model on your local device, confirm if the folder location exists.
If you’re using a HuggingFace online model, check if it exists.

3rd:

when I specify an offline model(../../AIEngine/megemma-27b-it) in test_unsloth.py it finds the model but I get the following error:

…….
File “/usr/local/lib/python3.12/dist-packages/huggingface_hub/file_download.py”, line 1297, in _hf_hub_download_to_local_dir
_download_to_tmp_and_move(
File “/usr/local/lib/python3.12/dist-packages/huggingface_hub/file_download.py”, line 1735, in _download_to_tmp_and_move
http_get(
File “/usr/local/lib/python3.12/dist-packages/huggingface_hub/file_download.py”, line 401, in http_get
raise ValueError(
ValueError: Fast download using ‘hf_transfer’ is enabled (HF_HUB_ENABLE_HF_TRANSFER=1) but ‘hf_transfer’ package is not available in your environment. Try pip install hf_transfer.

Is there a working manual on how to ft LLMs with unsloth?

additional information:

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Wed_Aug_20_01:57:39_PM_PDT_2025
Cuda compilation tools, release 13.0, V13.0.88
Build cuda_13.0.r13.0/compiler.36424714_0

nvidia-smi
NVIDIA-SMI 580.95.05 Driver Version: 580.95.05 CUDA Version: 13.0 |

docker run --gpus all --ulimit memlock=-1 -it --ulimit stack=67108864 --entrypoint /usr/bin/bash -v $(pwd):/home nvcr.io/nvidia/pytorch:25.09-py3

pip list (from within the docker container)
Package Version


absl-py 2.3.1
accelerate 1.11.0
aiohappyeyeballs 2.6.1
aiohttp 3.13.2
aiosignal 1.4.0
annotated-types 0.7.0
anyio 4.10.0
apex 0.1
argon2-cffi 25.1.0
argon2-cffi-bindings 25.1.0
arrow 1.3.0
asttokens 3.0.0
astunparse 1.6.3
async-lru 2.0.5
attrs 25.3.0
audioread 3.0.1
babel 2.17.0
beautifulsoup4 4.13.5
bitsandbytes 0.48.2
black 25.1.0
bleach 6.2.0
build 1.3.0
certifi 2025.8.3
cffi 1.17.1
charset-normalizer 3.4.3
click 8.2.1
cmake 3.31.6
comm 0.2.3
contourpy 1.3.3
cycler 0.12.1
Cython 3.1.3
datasets 4.3.0
debugpy 1.8.16
decorator 5.2.1
defusedxml 0.7.1
dill 0.4.0
dllist 2.0.0
dm-tree 0.1.9
einops 0.8.1
execnet 2.1.1
executing 2.2.1
expecttest 0.3.0
fastjsonschema 2.21.2
filelock 3.19.1
flash_attn 2.7.4.post1
fonttools 4.60.0
fqdn 1.5.1
frozenlist 1.8.0
fsspec 2025.9.0
gast 0.6.0
grpcio 1.74.0
h11 0.16.0
hf-xet 1.2.0
httpcore 1.0.9
httpx 0.28.1
huggingface-hub 0.36.0
hypothesis 6.130.8
idna 3.10
importlib_metadata 8.7.0
iniconfig 2.1.0
ipykernel 6.30.1
ipython 9.5.0
ipython_pygments_lexers 1.1.1
isoduration 20.11.0
isort 6.0.1
jedi 0.19.2
Jinja2 3.1.6
joblib 1.5.2
json5 0.12.1
jsonpointer 3.0.0
jsonschema 4.25.1
jsonschema-specifications 2025.4.1
jupyter_client 8.6.3
jupyter_core 5.8.1
jupyter-events 0.12.0
jupyter-lsp 2.3.0
jupyter_server 2.17.0
jupyter_server_terminals 0.5.3
jupyterlab 4.4.7
jupyterlab_code_formatter 3.0.2
jupyterlab_pygments 0.3.0
jupyterlab_server 2.27.3
jupyterlab_tensorboard_pro 4.0.0
jupytext 1.17.3
kiwisolver 1.4.9
lark 1.2.2
lazy_loader 0.4
librosa 0.11.0
lightning-thunder 0.2.5.dev0
lightning-utilities 0.15.2
lintrunner 0.12.7
llvmlite 0.44.0
looseversion 1.3.0
Markdown 3.9
markdown-it-py 4.0.0
MarkupSafe 3.0.2
matplotlib 3.10.6
matplotlib-inline 0.1.7
mdit-py-plugins 0.5.0
mdurl 0.1.2
mistune 3.1.4
ml_dtypes 0.5.3
mock 5.2.0
mpmath 1.3.0
msgpack 1.1.1
multidict 6.7.0
multiprocess 0.70.16
mypy_extensions 1.1.0
nbclient 0.10.2
nbconvert 7.16.6
nbformat 5.10.4
nest-asyncio 1.6.0
networkx 3.5
ninja 1.13.0
notebook 7.4.5
notebook_shim 0.2.4
numba 0.61.2
numpy 2.1.0
nvfuser 0.2.29+gita71c674
nvidia-cudnn-frontend 1.14.0
nvidia-dali-cuda130 1.51.2
nvidia-ml-py 13.580.82
nvidia-modelopt 0.33.0
nvidia-modelopt-core 0.33.0
nvidia-nvcomp-cu13 5.0.0.6
nvidia-nvimgcodec-cu13 0.6.0.32
nvidia-nvjpeg-cu13 0.0.0a0
nvidia-nvjpeg2k-cu13 0.9.0.43
nvidia-nvtiff-cu13 0.5.1.75
nvidia-resiliency-ext 0.4.1+cuda13
onnx 1.18.0
onnx-ir 0.1.9
onnxscript 0.3.1
opt_einsum 3.4.0
optree 0.17.0
packaging 25.0
pandas 2.3.3
pandocfilters 1.5.1
parso 0.8.5
pathspec 0.12.1
peft 0.17.1
pexpect 4.9.0
pillow 11.3.0
pip 25.2
platformdirs 4.4.0
pluggy 1.6.0
polygraphy 0.49.26
pooch 1.8.2
prometheus_client 0.22.1
prompt_toolkit 3.0.52
propcache 0.4.1
protobuf 6.32.0
psutil 7.0.0
ptyprocess 0.7.0
PuLP 3.2.2
pure_eval 0.2.3
pyarrow 22.0.0
pybind11 3.0.1
pybind11-global 3.0.1
pycocotools 2.0+nv0.8.1
pycparser 2.22
pydantic 2.11.9
pydantic_core 2.33.2
Pygments 2.19.2
pynvml 13.0.1
pyparsing 3.2.4
pyproject_hooks 1.2.0
pytest 8.1.1
pytest-flakefinder 1.1.0
pytest-rerunfailures 16.0.1
pytest-shard 0.1.2
pytest-xdist 3.8.0
python-dateutil 2.9.0.post0
python_hostlist 2.3.0
python-json-logger 3.3.0
pytorch-triton 3.4.0+gitc817b9b6
pytz 2025.2
PyYAML 6.0.2
pyzmq 27.0.2
referencing 0.36.2
regex 2025.9.1
requests 2.32.5
rfc3339-validator 0.1.4
rfc3986-validator 0.1.1
rfc3987-syntax 1.1.0
rich 14.1.0
rpds-py 0.27.1
safetensors 0.6.2
scikit-learn 1.7.1
scipy 1.16.1
Send2Trash 1.8.3
setuptools 79.0.1
six 1.16.0
sniffio 1.3.1
sortedcontainers 2.4.0
soundfile 0.13.1
soupsieve 2.8
soxr 0.5.0.post1
stack-data 0.6.3
sympy 1.14.0
tabulate 0.9.0
tensorboard 2.20.0
tensorboard-data-server 0.7.2
tensorrt 10.13.3.9
terminado 0.18.1
threadpoolctl 3.6.0
tinycss2 1.4.0
tokenizers 0.22.1
torch 2.9.0a0+50eac811a6.nv25.9
torch_tensorrt 2.9.0a0
torchao 0.13.0+git
torchprofile 0.0.4
torchvision 0.24.0a0+98f8b375
tornado 6.5.2
tqdm 4.67.1
traitlets 5.14.3
transformer_engine 2.7.0+fedd9dd
transformers 4.57.1
trl 0.19.1
types-python-dateutil 2.9.0.20250822
typing_extensions 4.15.0
typing-inspection 0.4.1
tzdata 2025.2
unsloth 2025.11.1
unsloth_zoo 2025.11.1
uri-template 1.3.0
urllib3 2.5.0
uv 0.8.17
wcwidth 0.2.13
webcolors 24.11.1
webencodings 0.5.1
websocket-client 1.8.0
Werkzeug 3.1.3
wheel 0.45.1
wrapt 1.17.3
xdoctest 1.0.2
xxhash 3.6.0
yarl 1.22.0
zipp 3.23.0

Hi, we are working on improving our Unsloth playbook and I will get back to you when it’s updated

+1 on this issue. It seems like a lot of the DGX spark playbooks are broken. I’ve been going through them one by one and have only gotten a few to work without modifications. There are also issues with VLLM and many other examples in the playbook. This makes most of the DGX spark unusable for learning.

1 Like

For whatever reason, this specific sequence of commands resolved my issues. I hope this helps others who might be facing a similar problem.

docker run --gpus all --ulimit memlock=-1 -it --ulimit stack=67108864 --entrypoint /usr/bin/bash --rm nvcr.io/nvidia/pytorch:25.10-py3

pip install transformers peft datasets
pip install --no-deps unsloth unsloth_zoo

pip install --no-deps bitsandbytes

pip install --upgrade torchao

pip install --upgrade unsloth unsloth-zoo transformers

this is what I learned from debugging with gemini2.5Pro:

_This is one of the most common and confusing issues people face when using Docker, and you’ve perfectly demonstrated it.

The short answer is: The container image is identical. The environment inside the container is being contaminated by the files you are mounting from your host machine in the first command.

Let’s break down the evidence.
The Evidence: Two Different Pythons

Look closely at the Python startup messages:

Command 1 (The Broken One):
code Code

Python 3.13.9 | packaged by Anaconda, Inc. | (main, Oct 21 2025, 19:17:31) [GCC 11.2.0] on linux

print(torch.version)
2.9.0+cpu

Command 2 (The Correct One):
code Code

Python 3.12.3 (main, Aug 14 2025, 17:47:21) [GCC 13.3.0] on linux

print(torch.version)
2.9.0a0+145a3a7bda.nv25.10_

I effectively mess up the originial Python version (3.12.3 ) and pytorch version (2.9.0a0+145a3a7bda.nv25.10) by mounting my home directory (-v ( p w d ) : (pwd) # This is dangerous when run from your home directory) because this introduces unwanted changes to the container.

By only mounting your project folders, you get access to your code without the risk of your host’s configuration files and software installations interfering with the container’s carefully prepared environment.