'cuda HW' field is missing

0-0 · January 7, 2025, 12:01pm

Hi,
I’v been using nsight systems to profile Tensorrt and onnx python scripts in offline cases, it worked fine and can show “CUDA HW” and all subfields.

But when I use cli command to profile vllm (in no-eager mode where models should be converted to cuda graphs) python scripts on openai-api backed, the output file show like this, in which the “CUDA HW” and all subfields are missing. Also, in the “Threads” field there isn’t “CUDA API” subfield.

The offline and online tests have all nsys cli commands and params being exactly the same:

    sudo nsys profile \
        --gpu-metrics-devices=0 \
        --trace="cuda,nvtx" \
        --cuda-graph-trace="graph" \
        --cuda-memory-usage="true" \
        --output="path/to/myProfile" \
        --force-overwrite true \
        path/to/my/python/execution \
        "path/to/my/python/script.py"

The target info is: Rocky Linux | NVIDIA A100-SXM4-80GB | cuda driver version>=545 | CUDA version >=12.4 | nsight systems 2024.7.1.

Thank you for your help in advance!

hwilper · January 7, 2025, 1:46pm

@liuyis can you respond to this.

liuyis · January 7, 2025, 5:48pm

Hi @0-0, was the application exiting gracefully on Linux? The CUDA trace feature holds a buffer within the application’s process(es), and if it was forcely killed, the buffer might not be flushed and CUDA trace data can be missing.

One thing you can try is adding the --duration=<seconds> option and set it a little shorter than the application’s execution time, that will allow the collection to finish ealier and make sure buffer is flushed.

0-0 · January 8, 2025, 3:45am

Hi, thank you for your reply. I’ve tried the --duration= option and still can’t show the missing fileds.

My python script (the script path in the nsys command) is as follow:

    import os
    from multiprocessing import Process

    def run_stress_test_client(port, script_path):
        python_executable = "path/to/my/python"
        client_command = [
            python_executable, ONLINE_SCRIPT_PATH,
            "--backend", "openai-chat",
            "--base-url", f"http://localhost:{port}",  
            "--endpoint", "/v1/chat/completions",        
            ...(other params)
        ]
        os.execv(python_executable, client_command) 
        
    if __name__ == "__main__":
        processes = []
        for port in ["8000", "8001", "8002"...]: 
        # Each CUDA device corresponds to an online port, and we send requests to these ports at the same time
            print(f"\n\t Running process for port: {port}\n")
            process = Process(
                target=run_stress_test_client,
                args=(port, script_path),
            )
            process.start()  
            processes.append(process)
        
        # wait until all process finished
        for process in processes:
            process.join()
        
        print("All processes completed.")

The core content of “ONLINE_SCRIPT_PATH” in the above run_stress_test_client() function is as follow:

    import asyncio
    async def send_requests(
        backend, url, port, model_id, input_requests, max_concurrency...
    ):
        tasks: List[asyncio.Task] = []
        async for request in request_list(...):
            request_func_input = RequestFuncInput(model,prompt,url,port...)
            tasks.append(
                asyncio.create_task(_request_func(...))
                )
        outputs: List[RequestFuncOutput] = await asyncio.gather(*tasks)
        # process outputs...

    if __name__ == "__main__":
        benchmark_result = asyncio.run(
            send_requests(...)
        )

I’ve found that if I don’t use asyncio, and modify the script called by ‘multiprocessing’ library into a totally offline script that use a local model, the problem will be solved, and all datafields will show fine.

But I want to use nsight on online serving cases. Could you give some suggestions on why this would happen, and how to use nsight on online serving cases correctly?

(I think my previous content may cause some misunderstandings, for I wasn’t much clear about the source of the question when I first commited the post. So I edited the first commited post content. So sorry for troubling you. )

Thank you again for your help.

liuyis · January 8, 2025, 5:22pm

By “online serving cases”, does it mean the model is running on a remote system or process? If that’s the case, Nsys cannot get CUDA activities from the remote system or process. The CUDA activities has to be generated by the target application you launched through Nsys, or any child process of it, in order for Nsys to collect.

The suggestion is to use Nsys to profile the actual process (whether it is on a remote system or a different process on the same system) that runs the model and therefore has CUDA usage.

0-0 · January 9, 2025, 2:21am

Hi, the model is indeed running on a remote process on the gpu in--gpu-metrics-devices in the same system. I’ll use nsight to profile the server process. Thank you!

Topic		Replies	Views
Can not get CUDA python backtrace Profiling Linux Targets	12	1791	May 7, 2023
Nsight Systems does not collect CUDA events Profiling Linux Targets	21	8703	January 11, 2023
Sqlite does not contain CUDA kernel data CUDA on Windows Subsystem for Linux	12	3387	April 28, 2023
Show Cuda HW in Nsight System Profiling Linux Targets cuda	13	178	August 22, 2024
Nsight-system can't recognize the conda enviroment when profile the application Profiling Linux Targets cuda	4	1128	March 2, 2023
'osrt_sum' stats report no data available Profiling x86 Windows Targets	4	224	June 25, 2024
Nsys does not show CUDA kernels Profiling Linux Targets	6	1233	December 12, 2022
No gpu events, Failed to connect to the application Profiling Linux Targets	9	954	October 12, 2021
NSight Systems does not profile subprocess(via fork in unistd or Process in python.multiprocess) CUDA_API Profiling Linux Targets	6	1166	September 23, 2024
Nsys profile error: invalidArgumentException, unknown API driver activity Profiling Linux Targets nsight	17	3398	July 28, 2023

'cuda HW' field is missing

Related topics