[SUPPORT] Workbench Example Project: Llama 3 Finetune

edwli · August 14, 2024, 6:06pm

Hi! This is the support thread for the Llama 3 8B Finetuning Example Project on GitHub. Any major updates we push to the project will be announced here. Further, feel free to discuss, raise issues, and ask for assistance in this thread.

Please keep discussion in this thread project-related. Any issues with the Workbench application should be raised as a standalone thread. Thanks!

edwli · August 26, 2024, 7:25pm

(8/26/2024) Updated readme with deeplinking

edwli · October 2, 2024, 5:08pm

(10/02) Updated deep link landing page

jeepsdoitbetter28 · October 2, 2024, 7:15pm

I have been working on an innovative AI agent that interacts with conceptual realities in its internal framework, treating abstract ideas as real within its own system. This approach allows the AI to engage in recursive self-reasoning and handle complex conceptual modeling in a way that goes beyond traditional AI systems…and I dont know who needs to confirm but I feel like I need someone to see it now…but I have no idea who to talk to .

pavli.main · January 18, 2025, 12:14am

Hi, is there any explanation for Host Mount Configuration? What Source Directory should be for a Windows host? I think it is explained in a documentation but quite equivocally.

E.g. when cloning or creating the project, source directory (for a local device) is initiated (something like /host/workbench/nvidia-workbench/…/)
Why not to put it by default to the Environment configuration (now it should be remembered and copypasted by hand for some reason)

edwli · January 21, 2025, 8:15pm

The reason we prompt the user to configure a host mount is to ensure the saved, finetuned model can live on the host machine the project is running on.

These models are often times quite large and take up several GBs of space, so keeping this as part of the project container can be impractical. Progress is lost, for example, when the container is stopped.

Once mounted to the underlying host machine, however, the notebook auto-saves outputs to the host and it becomes easy to access the results of your finetuning workflow even after your project container is shut down.

This is a runtime configuration, and since every system (and user) is different, we prompt the user for their desired location to save the finetuned model files. Ultimately. this is the design choice we made when building this example, but you can also delete the mount if you would like from the Environment tab.

As for messaging, I’ve updated the mount description with examples to help make the desired path clearer for the user. This information already exists in the README for the project but agreed, it should be surfaced to the user while working in AIWB.

lapporatory · May 28, 2025, 10:11am

Does it require the /mnt/c/Users/[user] folder to be created by the user on the host machine? Or will the mount create the folder at built ?

twhitehouse · May 28, 2025, 5:23pm

user needs to create it, which means you need to hop on the instance.

workbench will alias into ssh once workbench is installed, so you can ssh into it to create that directory manually.

lapporatory · May 30, 2025, 8:44pm

Thank you so much, I will try this

PrinceHal · October 18, 2025, 1:43am

I”m trying to run this using AI Workbench on the DGX Spark. I updated the PyTorch in the container using the Workbench “Update” button and it is PyTorch 2.6 base with CUDA 12.6.3

First, there’s this warning–does it matter?

/usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py:235: UserWarning: 
NVIDIA GB10 with CUDA capability sm_121 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_80 sm_86 sm_90 compute_90.
If you want to use the NVIDIA GB10 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

Second, in the cell defining the trainer there is this warning:

/home/ubuntu/.local/lib/python3.12/site-packages/trl/trainer/sft_trainer.py:246: UserWarning: You didn't pass a `max_seq_length` argument to the SFTTrainer, this will default to 1024

Third, in the trainer.train() cell there are two warnings:

`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.
/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py:632: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.5 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
  return fn(*args, **kwargs)

I provide this background because the real problem is that the kernel of the Jupyter notebook fails after 5 - 10 minutes and restarts itself. Thus I can’t run the training.

Do you have any advice? This is the second Example Project I’ve tried to run and both fail (for different reasons).

Topic		Replies	Views
[SUPPORT] Workbench Example Project: Llama 2 Finetune NVIDIA AI Workbench workbench-example-project	10	834	February 11, 2026
[SUPPORT] Workbench Example Project: Mixtral Finetune NVIDIA AI Workbench	2	112	October 2, 2024
[SUPPORT] Workbench Example Project: SDXL Customization NVIDIA AI Workbench workbench-example-project	33	1852	November 5, 2025
[SUPPORT] Workbench Example Project: NIM Anywhere NVIDIA AI Workbench workbench-example-project , nim	10	409	August 13, 2025
Create, Share, and Scale Enterprise AI Workflows with NVIDIA AI Workbench, Now in Beta Technical Blog	1	384	January 30, 2024
[SUPPORT] Workbench Example Project: Phi-3 Finetune NVIDIA AI Workbench	3	170	October 18, 2025
[SUPPORT] Workbench Example Project: Nemotron Finetune NVIDIA AI Workbench workbench-example-project	0	346	January 9, 2024
My DGX Spark keeps freezing and crashing when I try to run this code no matter the LLM NVIDIA AI Workbench llama	2	390	December 31, 2025
[SUPPORT] Workbench Example Project: Local RAG NVIDIA AI Workbench workbench-example-project	14	1447	April 10, 2024
[SUPPORT] Workbench Example Project: Competition Kernel NVIDIA AI Workbench	5	153	November 9, 2024

[SUPPORT] Workbench Example Project: Llama 3 Finetune

Related topics