My DGX Spark keeps freezing and crashing when I try to run this code no matter the LLM

Hello.

I have written a code that is supposed to finetune an LLM to write reports. I have provided it 18 or so reports for it to train on. nothing major. However, everytime I run the code no matter what LLM I use, the NVIDIA AI Workbench freezes and then entire Spark crashes and restarts. Is it even possible to finetune an LLM on DGX Spark? I have provided my code. Please tell me what I am doing wrong.

import json
import torch
from transformers import AutoModelForCausalLM, AutoProcessor, AutoTokenizer, TrainingArguments, Trainer
from peft import LoraConfig, get_peft_model
from dataset_llama32_vision import LlamaVisionDataset

MODEL_NAME = “Meta-LLaMA-3-2-11B-Vision”

Load model

model = AutoModelForCausalLM.from_pretrained(
MODEL_NAME,
torch_dtype=torch.bfloat16,
device_map=“auto”,
load_in_4bit=True
)

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
processor = AutoProcessor.from_pretrained(MODEL_NAME)

LoRA config

lora_config = LoraConfig(
r=16,
lora_alpha=32,
lora_dropout=0.05,
target_modules=[“q_proj”, “v_proj”],
task_type=“CAUSAL_LM”
)

model = get_peft_model(model, lora_config)

Dataset

train_dataset = LlamaVisionDataset(“dataset.jsonl”, tokenizer, processor)

Training arguments

args = TrainingArguments(
output_dir=“lora_output”,
per_device_train_batch_size=1,
gradient_accumulation_steps=16,
num_train_epochs=3,
learning_rate=2e-4,
fp16=False,
bf16=True,
logging_steps=10,
save_steps=500,
)

trainer = Trainer(
model=model,
args=args,
train_dataset=train_dataset,
)

trainer.train()

model.save_pretrained(“lora_output/final_lora”)