FATAL Error while profiling llama model fine-tuning using nsight systems

khyatpatel417 · January 28, 2025, 4:32am

Getting the following error:

I’m trying to profile llama3-8b model training in a docker container (nvcr.io/nvidia/pytorch 24.12-py3) with root enabled using llama-cookbook

This is the command I’m running:
nsys profile -t cuda,opengl,nvtx,osrt -w true -x true -o out/llama_8b_h100_singlegpu --sample=none --cpuctxsw=none --gpu-metrics-devices=all --cuda-memory-usage=true -y 60 --duration 600 --wait all python -m finetuning --use_peft --peft_method lora --quantization 4bit --quantization_config.quant_type nf4 --batch_size_training 1 --context_length 2048 --dataset samsum_dataset --model_name “meta-llama/Llama-3.1-8B” --output_dir out/

Can anyone please help me fix this issue?

hwilper · January 28, 2025, 1:27pm

Can you try to run w/o the osrt option?

Topic		Replies	Views
Why can't get cuda kernel for vllm in nsight system Profiling Linux Targets	1	43	January 30, 2026
Llama pytorch profiling Nsight Compute	3	900	November 27, 2023
Nsys Profiling error for distributed training of LLaMA 2 7B Profiling Linux Targets cudnn	1	163	July 20, 2024
Nsys Profile VLLM Error Profiling Linux Targets machine-learning	2	1007	May 15, 2024
Nsight system profiling FATAL ERROR Profiling Linux Targets	7	1346	May 2, 2023
Nsys profile doesn't return detail information of cuda/nvtx from docker container Profiling Linux Targets nsight	3	1173	February 28, 2022
'cuda HW' field is missing Profiling Linux Targets nsight	6	272	January 9, 2025
Issue running Nsys profiler in docker and analysing the results Profiling Linux Targets cuda , nsight , docker , profiling , holoscan	1	52	February 23, 2026
Nsight-system failed to start profiling Profiling x86 Windows Targets	9	2685	October 12, 2021
Remote profiling within a container docker/k3s Profiling Embedded Targets nsight , docker , containers	9	2486	January 19, 2022

FATAL Error while profiling llama model fine-tuning using nsight systems

Related topics