How to measure memory workload with nsight compute when SLM is running in Orin

MicroSW · May 6, 2024, 2:13pm

Hi,
I watched this video, it is very cool, I want to know how to run this tool and measure memory workload, I saw Orin has installed nsys-ui, I can launch it, but it looks like different with one in this video. do you have a website?

Launch SLM command: jetson-containers run $(autotag nano_llm) python3 -m nano_llm.chat --api=mlc --model princeton-nlp/Sheared-LLaMA-2.7B-ShareGPT

veraj · May 7, 2024, 2:02am

Hi, @MicroSW

The tool is Nsight Compute. Its binary is ncu-ui. Do you have it in your Orin ?
You can refer Getting Started with Nsight Compute | NVIDIA Developer for download and documentation.

MicroSW · May 7, 2024, 1:28pm

hi @veraj

thanks. I installed this tool ncu-ui, do you have any idea how to trace memroy workload, I used this command to trace:

sudo nsys profile --trace=cuda --cudabacktrace memory --cuda-memory-usage true --gpu-metrics-device help jetson-containers run $(autotag nano_llm) python3 -m nano_llm.chat --api=mlc --model princeton-nlp/Sheared-LLaMA-2.7B-ShareGPT

this cannot capture any GPU memory info, do you know how to modify above command to trace memory workload?

the second question is that this SLM is running in container, do you know it is possible that ncu or nsys to trace container’s SLM?

I tried ncu:

it always shows
Preparing to launch…

Launched process: jetson-containers (pid: 6101)

/mnt/jetson-containers/jetson-containers run $(autotag nano_llm) python3 -m nano_llm.chat --api=mlc --model princeton-nlp/Sheared-LLaMA-2.7B-ShareGPT

Attempting to automatically connect…

Searching for attachable process 6101 on local socket…

veraj · May 8, 2024, 11:16am

Please enter into your container, and execute

sudo ncu --set full -o report python3 -m nano_llm.chat --api=mlc --model princeton-nlp/Sheared-LLaMA-2.7B-ShareGPT

Then open the generated report in NCU-UI, and you can memory details in “Details” page.

MicroSW · May 8, 2024, 3:02pm

thanks so much for your support, now the last question, do you know how to let it run background, and not interrupt my input? because I cannot input my question in SLM with this command.

it continues showing this:

veraj · May 9, 2024, 6:14am

Hi, @MicroSW

This can be achieved using --mode option. You can launch application from one terminal and attach it from other terminal, this way you can give inputs from launch terminal and profiling logs will be on the attach terminal.

Pasting from ./ncu --help
Launch an application for later attach:
ncu --mode=launch MyApp
Attach to a previously launched application:
ncu --mode=attach --hostname 127.0.0.1

MicroSW · May 10, 2024, 3:07pm

hi @veraj

thanks so much again, I launched app in one terminal, and then attached in the second terminal, its results as below:

launch command: /opt/nvidia/nsight-compute/2022.2.1/ncu --mode=launch python3 -m nano_llm.chat --api=mlc --model princeton-nlp/Sheared-LLaMA-2.7B-ShareGPT

attach command: ncu --mode=attach --hostname 127.0.0.1

the input terminal was stuck there forever, and didn’t show a response, but the second terminal, it continues showing the log:

wait…

I left it running for half of hour, then it shows :I am,

It seems that the mcu pauses to respond. When I stop the mcu in the second terminal, the response shows smoothly. Do you know how to make the ncu not pause to respond?

veraj · May 13, 2024, 3:17am

Hi, @MicroSW

Please add more filter option in your ncu --mode=attach command line to reduce profile workload. See 4. Nsight Compute CLI — NsightCompute 12.4 documentation Customizing data collection

MicroSW · May 13, 2024, 4:10pm

Hi @veraj

really appreciated your support, I tried several options. it failed. I only want to measure and collect memory data, do you know which option I only add in?

I tried:

1, /ncu --mode=attach -f --section=MemoryWorkloadAnalysis -o report --hostname 127.0.0.1,
2, ./ncu --mode=attach -f --section=MemoryWorkloadAnalysis --section=MemoryWorkloadAnalysis_Chart --section=MemoryWorkloadAnalysis_Tables -o report --hostname 127.0.0.1

veraj · May 16, 2024, 6:17am

What do you mean “failed”？
Do you see any error printed or still ncu seems pause to respond ？

MicroSW · May 16, 2024, 7:35am

@veraj

sorry for the confusing statement, I meant it’s the same as no filter option, ncu still blocks the response speed. would you please check and tell me how can I achieve this:

1, let ncu collect background, don’t block input and output
2, only let ncu collect memory workload, bandwidth, throughput

veraj · May 16, 2024, 7:58am

Thanks ! I will check internally.

Plus, is it possible to provide a mini-repo that can show the block issue?

MicroSW · May 16, 2024, 12:11pm

Hi @veraj

do you meant the log or screen shoot like this:

1, launch app, and attach

2, type text in lllama, and then as you see, the output/response from llama is blocked:

veraj · May 17, 2024, 2:01am

Hi, @MicroSW

I noticed you are using a very old version 2022.2.1. We have many new features and fixes since then.
Can you update to a latest version to check ?

In the meantime, we’ll try to repro this internally.

MicroSW · May 17, 2024, 2:27pm

@veraj I noticed the latest ncu doesn’t have a separate package that allows us to install ncu alone, it is

Nsight Compute is available in the CUDA Toolkit bundled in the JetPack SDK.

do you have any idea which stable version has a tar package or use apt install it?

I tried to install it via sdkmanager, but ncu is not in the developer tools list:

veraj · May 20, 2024, 2:34am

Yes. This is by design. Recently SDK Manager removes the standalone Nsight Compute as it is already bundled in the cuda toolkit.

veraj · June 1, 2024, 12:01am

This topic was automatically closed after 3 days. New replies are no longer allowed.

Topic		Replies	Views
Segmentation fault Nsight Compute cuda , nsight	26	1274	July 16, 2024
Can't Get NCU GUI To Import Properly Nsight Compute	8	1393	October 5, 2020
Using Nsight Compute to Inspect your Kernels Technical Blog	2	1729	August 31, 2020
Nsight returning incorrect results Nsight Compute	4	587	August 20, 2019
nv-nsight-cu-cli outputs many n/a data Nsight Compute	2	916	August 15, 2019
How can I use ncu to get kernel runtime like use "nvprof --print-gpu-trace" Nsight Compute	6	1194	October 12, 2021
Nsight and nvprof results have large differences Nsight Compute	9	1218	November 26, 2019
NVIDIA® Nsight™ Compute 2019.4 is now available Nsight Compute	1	598	May 27, 2020
How do i get some of the nvprof metrics in insight? Nsight Compute	0	746	June 2, 2021
Why I can't use Nvidia Nsight Compute to get T4 report Nsight Compute cuda	1	424	July 8, 2020

How to measure memory workload with nsight compute when SLM is running in Orin

Related topics