Segmentation fault

I tried the NVIDIA-Nsight-Compute-2024.1 to trace dram workload while Llama is launched, but experienced the Segmentation fault:

on the first console:

root@orin1:/data/NVIDIA-Nsight-Compute-2024.1# ls
docs EULA.txt extras host ncu ncu-ui sections target
root@orin1:/data/NVIDIA-Nsight-Compute-2024.1# ./ncu --mode=launch python3 -m nano_llm.chat --api=mlc --model princeton-nlp/Sheared-LLaMA-2.7B-ShareGPT
==PROF== Waiting for profiler to attach on ports 49152-49215.
/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.18) or chardet (3.0.4) doesn’t match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn’t match a supported "
/usr/local/lib/python3.8/dist-packages/transformers/utils/hub.py:124: FutureWarning: Using TRANSFORMERS_CACHE is deprecated and will be removed in v5 of Transformers. Use HF_HOME instead.
warnings.warn(
Fetching 14 files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:00<00:00, 126280.12it/s]
14:02:15 | INFO | loading /data/models/huggingface/models–princeton-nlp–Sheared-LLaMA-2.7B-ShareGPT/snapshots/802be8903ec44f49a883915882868b479ecdcc3b with MLC
You are using the default legacy behaviour of the <class ‘transformers.models.llama.tokenization_llama.LlamaTokenizer’>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in ⚠️⚠️[`T5Tokenize`] Fix T5 family tokenizers⚠️⚠️ by ArthurZucker · Pull Request #24565 · huggingface/transformers · GitHub

14:02:48 | INFO | device=cuda(0), name=Orin, compute=8.7, max_clocks=1300000, multiprocessors=16, max_thread_dims=[1024, 1024, 64], api_version=11040, driver_version=None
14:02:48 | INFO | loading Sheared-LLaMA-2.7B-ShareGPT from /data/models/mlc/dist/Sheared-LLaMA-2.7B-ShareGPT-ctx4096/Sheared-LLaMA-2.7B-ShareGPT-q4f16_ft/Sheared-LLaMA-2.7B-ShareGPT-q4f16_ft-cuda.so
Fatal Python error: Segmentation fault

Thread 0x0000fffed09c9f40 (most recent call first):
File “/usr/lib/python3.8/threading.py”, line 306 in wait
File “/usr/lib/python3.8/threading.py”, line 558 in wait
File “/usr/local/lib/python3.8/dist-packages/tqdm/_monitor.py”, line 60 in run
File “/usr/lib/python3.8/threading.py”, line 932 in _bootstrap_inner
File “/usr/lib/python3.8/threading.py”, line 890 in _bootstrap

Current thread 0x0000ffff9dde5980 (most recent call first):
File “/opt/NanoLLM/nano_llm/models/mlc.py”, line 110 in init
File “/opt/NanoLLM/nano_llm/nano_llm.py”, line 71 in from_pretrained
File “/opt/NanoLLM/nano_llm/chat/main.py”, line 29 in
File “/usr/lib/python3.8/runpy.py”, line 87 in _run_code
File “/usr/lib/python3.8/runpy.py”, line 194 in _run_module_as_main
==ERROR== The application returned an error code (11).
root@orin1:/data/NVIDIA-Nsight-Compute-2024.1#

in the second console:

root@orin1:/data/NVIDIA-Nsight-Compute-2024.1# ./ncu --mode=attach -f --section=MemoryWorkloadAnalysis --section=MemoryWorkloadAnalysis_Chart --section=MemoryWorkloadAnalysis_Tables -o report --hostname 127.0.0.1
==PROF== Finding attachable processes on host 127.0.0.1.
==PROF== Attaching to process ‘/usr/bin/python3.8’ (308) on port 49152.
==PROF== Connected to process 308 (/usr/bin/python3.8)
==WARNING== No kernels were profiled.
==WARNING== Profiling kernels launched by child processes requires the --target-processes all option.

I ever used ncu 2022, there is no segmentation fault issue, but it ncu 2022 will suspend the llama response, I have created the topic here

based on @ veraj suggestion, I should use the latest version of ncu.

@veraj would you please help me check this internally?

Hi, @MicroSW

I would like to help to check.
Does the segmentation fault happen for other simple CUDA sample ?
If not, can you guide me to set up the application ?

@veraj
it is this application: Small LLM (SLM) - NVIDIA Jetson AI Lab

segmentation fault is related to nsight compute, if only run application, there is no any issue. also if using ncu 2022, there is no segmentation fault issue.

Thanks ! Will check internally and let you know if any update.

@veraj do you have any update on this case? if you need further information from my end for your investigation, let me know, thanks very much for your support!!

Sorry, no update yet. I will try to give your response this week.

Also can you tell which CUDA version is installed in your device ?
When you use Nsight 2024.1, have you installed some compat package ?