Unsloth SFTTrainer failed due to triton assertion error on AGX Orin when batch size > 1

Hello guys,

I’m currently trying to deploy and run an unsloth environement on my AGX ORIN 64Go devkit.

I managed to correctly deploy all the deps and unsloth and it seemed working until I followed an Unsloth tuto to check if it was running well. (here is the link: Alpaca_+_Mistral_7b_full_example.ipynb)

When I launched the SFTTrainer I got an Assertion error on my 12th epoch not depending on the model I chose.
I tried 500m parameter, 7b, 14b which ended with the same error.

BackendCompilerFailed: backend=‘inductor’ raised:
SubprocException: An exception occurred in a subprocess:

Traceback (most recent call last):
  File "/home/aienv/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 278, in do_job
    result = job()
  File "/home/aienv/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton
    load_kernel().precompile(warm_cache_only=True)
  File "/home/aienv/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 234, in precompile
    compiled_binary, launcher = self._precompile_config(
  File "/home/aienv/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 365, in _precompile_config
    ASTSource(
  File "/home/aienv/lib/python3.10/site-packages/triton/compiler/compiler.py", line 63, in __init__
    assert isinstance(k, tuple)
AssertionError

I tried to change the per_device_train_batch_size from 2 to 5, It trigerred the error instantly.
I tried to change to 1 and it was correctly working. (even if I’m not convince that the model was actually train but this is another topic).

After some research, I found that is could be some iGPU handling problem but did not found any precision on that.

I’m using CUDA 12.6, Unsloth 2025.2.15, torch 2.5.0a0+872d972e41.nv24.8, Xformers 0.0.28.post3
I already tried to reinstall unsloth, recompile triton, still not working.

Torch found my cuda device successfuly, unsloth is (supposingly) allocating my memory correctly. Muy Nvdia-smi doesn’t find any process when I’m running the script


image

I’m clueless here, do you guys have any Idea?

Here is the full package list in case:

Package                   Version
------------------------- -------------------------
accelerate                1.4.0
acres                     0.2.0
aiofiles                  24.1.0
aiohappyeyeballs          2.4.6
aiohttp                   3.11.12
aiosignal                 1.3.2
annotated-types           0.7.0
anyio                     4.8.0
argon2-cffi               23.1.0
argon2-cffi-bindings      21.2.0
arrow                     1.3.0
asttokens                 3.0.0
async-lru                 2.0.4
async-timeout             4.0.3
attrs                     25.1.0
autocommand               2.2.2
babel                     2.17.0
backports.tarfile         1.2.0
beautifulsoup4            4.13.3
bitsandbytes              0.45.3.dev0
bleach                    6.2.0
blis                      1.2.0
catalogue                 2.0.10
certifi                   2025.1.31
cffi                      1.17.1
charset-normalizer        3.4.1
ci-info                   0.3.0
click                     8.1.8
cloudpathlib              0.20.0
cmake                     3.31.4
comm                      0.2.2
confection                0.1.5
configobj                 5.0.9
configparser              7.1.0
contourpy                 1.3.1
cupy-cuda12x              12.3.0
cut-cross-entropy         25.1.1
cycler                    0.12.1
cymem                     2.0.11
datasets                  3.3.2
debugpy                   1.8.12
decorator                 5.2.0
defusedxml                0.7.1
dill                      0.3.8
distro                    1.9.0
docstring_parser          0.16
etelemetry                0.3.1
exceptiongroup            1.2.2
executing                 2.2.0
fastjsonschema            2.21.1
fastrlock                 0.8.3
filelock                  3.17.0
fitz                      0.0.1.dev2
fonttools                 4.56.0
fqdn                      1.5.1
frontend                  0.0.3
frozenlist                1.5.0
fsspec                    2024.12.0
greenlet                  3.1.1
h11                       0.14.0
hf_transfer               0.1.9
httpcore                  1.0.7
httplib2                  0.22.0
httpx                     0.28.1
huggingface-hub           0.29.1
idna                      3.10
importlib_metadata        8.0.0
importlib_resources       6.5.2
inflect                   7.3.1
ipykernel                 6.29.5
ipython                   8.32.0
ipywidgets                8.1.5
isodate                   0.6.1
isoduration               20.11.0
itsdangerous              2.2.0
jaraco.collections        5.1.0
jaraco.context            5.3.0
jaraco.functools          4.0.1
jaraco.text               3.12.1
jedi                      0.19.2
jetson-stats              4.3.1
Jinja2                    3.1.5
joblib                    1.4.2
json5                     0.10.0
jsonpatch                 1.33
jsonpointer               3.0.0
jsonschema                4.23.0
jsonschema-specifications 2024.10.1
jupyter                   1.1.1
jupyter_client            8.6.3
jupyter-console           6.6.3
jupyter_core              5.7.2
jupyter-events            0.12.0
jupyter-lsp               2.2.5
jupyter_server            2.15.0
jupyter_server_terminals  0.5.3
jupyterlab                4.3.5
jupyterlab_pygments       0.3.0
jupyterlab_server         2.27.3
jupyterlab_widgets        3.0.13
kiwisolver                1.4.8
langchain                 0.3.19
langchain-core            0.3.37
langchain-text-splitters  0.3.6
langcodes                 3.5.0
langsmith                 0.3.10
language                  0.6
language_data             1.3.0
looseversion              1.3.0
lxml                      5.3.1
marisa-trie               1.2.1
markdown-it-py            3.0.0
MarkupSafe                3.0.2
matplotlib                3.10.0
matplotlib-inline         0.1.7
mdurl                     0.1.2
mistune                   3.1.2
more-itertools            10.3.0
mpmath                    1.3.0
multidict                 6.1.0
multiprocess              0.70.16
murmurhash                1.0.12
nbclient                  0.10.2
nbconvert                 7.16.6
nbformat                  5.10.4
nest-asyncio              1.6.0
networkx                  3.4.2
nibabel                   5.3.2
ninja                     1.11.1.3
nipype                    1.9.2
notebook                  7.3.2
notebook_shim             0.2.4
numpy                     1.26.4
orjson                    3.10.15
overrides                 7.7.0
packaging                 24.2
pandas                    2.2.3
pandocfilters             1.5.1
parso                     0.8.4
pathlib                   1.0.1
peft                      0.14.0
pexpect                   4.9.0
pillow                    11.0.0
pip                       25.0.1
platformdirs              4.3.6
preshed                   3.0.9
prometheus_client         0.21.1
prompt_toolkit            3.0.50
propcache                 0.3.0
protobuf                  3.20.3
prov                      2.0.1
psutil                    7.0.0
ptyprocess                0.7.0
pure_eval                 0.2.3
puremagic                 1.28
pyarrow                   19.0.1
pybind11                  2.13.6
pycparser                 2.22
pydantic                  2.10.6
pydantic_core             2.27.2
pydot                     3.0.4
Pygments                  2.19.1
PyMuPDF                   1.25.3
pyparsing                 3.2.1
python-dateutil           2.9.0.post0
python-json-logger        3.2.1
python-rapidjson          1.20
pytz                      2025.1
pyxnat                    1.6.3
PyYAML                    6.0.2
pyzmq                     26.2.1
rdflib                    6.3.2
referencing               0.36.2
regex                     2024.11.6
requests                  2.32.3
requests-toolbelt         1.0.0
rfc3339-validator         0.1.4
rfc3986-validator         0.1.1
rich                      13.9.4
rpds-py                   0.23.1
safetensors               0.5.2
scikit-learn              1.6.1
scipy                     1.15.2
Send2Trash                1.8.3
sentence-transformers     3.4.1
sentencepiece             0.2.0
setuptools                75.8.0
shellingham               1.5.4
shtab                     1.7.1
simplejson                3.20.1
six                       1.17.0
smart-open                7.1.0
smbus2                    0.5.0
sniffio                   1.3.1
soupsieve                 2.6
spacy                     3.8.3
spacy-legacy              3.0.12
spacy-loggers             1.0.5
SQLAlchemy                2.0.38
srsly                     2.5.1
stack-data                0.6.3
starlette                 0.45.3
sympy                     1.13.1
tenacity                  9.0.0
terminado                 0.18.1
thinc                     8.3.4
threadpoolctl             3.5.0
tinycss2                  1.4.0
tokenizers                0.21.0
tomli                     2.2.1
torch                     2.5.0a0+872d972e41.nv24.8
tornado                   6.4.2
tqdm                      4.67.1
traitlets                 5.14.3
traits                    7.0.2
transformers              4.49.0
triton                    3.2.0
trl                       0.15.1
typeguard                 4.4.2
typer                     0.15.1
types-python-dateutil     2.9.0.20241206
typing_extensions         4.12.2
tyro                      0.9.16
tzdata                    2025.1
unsloth                   2025.2.15
unsloth_zoo               2025.2.7
uri-template              1.3.0
urllib3                   2.3.0
uvicorn                   0.34.0
wasabi                    1.1.3
wcwidth                   0.2.13
weasel                    0.4.1
webcolors                 24.11.1
webencodings              0.5.1
websocket-client          1.8.0
wheel                     0.45.1
widgetsnbextension        4.0.13
wrapt                     1.17.2
xformers                  0.0.28.post3
xxhash                    3.5.0
yarl                      1.18.3
zipp                      3.19.2
zstandard                 0.23.0

Hi,

If the training only works with small batch sizes, this issue might relate to OOM.
Could you monitor the device during the training and check if it’s running out of memory when the crash happens?

$ sudo tegrastats

If the memory isn’t fully occupied when the crash happens, please share the step-by-step command to reproduce this issue so we can try it locally.

Thanks.

Hi there,

Thank you for your reply !

I managed to somehow get a workaround.

I reflashed the board to get a clean install with JP2 and updated CUDA to 12.8 with all the necessary library of the jp6/cu128 index.

I suspect the bug was related either because the torch library did not handle well this arch with CUDA 12.6 or because there was some conflict between the library I had.

Anyway, it is working now with:

  • torch 2.6.0
  • xformers 0.0.30+037cc1a.d202509
  • bitsandbytes 0.45.2
  • unsloth 2025.2.15

Hi,

Thanks for the update.
Good to know it works well now.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.