CUPTI activity API and child processes

pearson · September 20, 2018, 5:24pm

Hello,

I am trying to develop an application that activates CUPTI’s activity api in the main thread, and then creates a child thread that CUDA code is run within. The child thread is an external application invoked through Boost’s system call. The CUPTI activity API is not picking up any GPU activity in the child process - is there a way to configure CUPTI to pick up GPU activity from the child thread?

Thank you!

mjain · September 28, 2018, 8:22am

Activating CUPTI’s activity APIs in the main thread enables profiling of the main thread as well as all of its child threads. In the scenario when main thread creates a new process (for example using fork), profiling won’t happen for the new process. Does the later apply for your use case?

To pick up GPU activity in the new process, you can either have CUPTI APIs as part of the application code or inject CUPTI based profiling library into the application process.

chenglei.wang · January 2, 2019, 9:20am

Is there any sample code to show us how to using CUPTI API profile a new process(using fork) ?

rbischof · January 14, 2019, 3:57am

Unfortunately there is not any child process example code using CUPTI provided by NVIDIA.

chenglei.wang · January 14, 2019, 4:41am

Does nvprof using CUPTI? I found that nvprof’s logfile format is using CUPTI, but I don’t think it using CUPTI directly.

mjain · January 14, 2019, 5:58am

Correct, nvprof uses CUPTI under the hood. But CUPTI doesn’t provide the support for the child process profiling. It’s the responsibility of the CUPTI client to implement this support.

chenglei.wang · January 15, 2019, 2:43am

Thanks for your quick update. Base on those information, I am curious about how to using CUPTI to profiling GPU events for cuda executable? Say, when I using nvprof profiling cuda executable like below, I can get events information for both kernel. I know how to implement this with CUPTI client, but I don’t know how could I implement a tool which using CUPTI and can get GPU events for cuda executable…

nvprof -e fb_subp0_read_sectors ./concurrentKernels
[./concurrentKernels] - Starting…
==8361== NVPROF is profiling process 8361, command: ./concurrentKernels
GPU Device 0: “GeForce GTX 1050” with compute capability 6.1

Detected Compute SM 6.1 hardware with 5 multi-processors
Expected time for serial execution of 8 kernels = 0.080s
Expected time for concurrent execution of 8 kernels = 0.010s
Measured time for sample = 0.113s
Test passed
==8361== Profiling application: ./concurrentKernels
==8361== Profiling result:
==8361== Event result:
Invocations Event Name Min Max Avg Total
Device “GeForce GTX 1050 (0)”
Kernel: sum(long*, int)
1 fb_subp0_read_sectors 100 100 100 100
Kernel: clock_block(long*, long)
8 fb_subp0_read_sectors 112177 179004 153147 1225177

mjain · January 18, 2019, 3:58am

User can develop a profiling tool like nvprof by writing a CUPTI based shared library. This library needs to enable the appropriate CUPTI activities using the API cuptiActivityEnable() for tracing information, or call events/metrics API for profiling the GPU performance characteristics. For more control over the profiling session, user can use the CUPTI Callback API to register a callback into his code. Your callback will be invoked when the application being profiled calls a CUDA runtime or driver function, or when certain events occur in the CUDA driver. Refer CUPTI samples callback_event and callback_metric for the usage of CUPTI events and metrics APIs respectively.

For main and child process profiling, user can inject the shared library into the target application, e.g. using LD_PRELOAD or Detours, or by modifying the target application itself if applicable. From that library, user can initialize CUPTI for the whole target process. Refer below links for more information:
ld.so(8) - Linux manual page (for LD_PRELOAD)
https://github.com/Microsoft/Detours (for Detours)
https://docs.nvidia.com/cuda/cupti/index.html#r_initialization (for CUPTI initialization)

ammarwa · January 3, 2020, 12:30pm

For the linker to work and to read events it needs the context handler so how can I get the context handler to provide to the linker?

Topic		Replies	Views
Profiling application with CUPTI in a separate process? CUDA Programming and Performance	2	859	July 6, 2017
NVIDIA® CUDA Profiler Tools Interface (CUPTI) for CUDA Toolkit 11.2 is now available CUPTI – CUDA Profiler Tools Interface	7	950	April 15, 2021
NVIDIA® CUDA Profiler Tools Interface (CUPTI) for CUDA Toolkit 11.1 is now available CUPTI – CUDA Profiler Tools Interface	4	618	December 15, 2020
NVIDIA® CUDA Profiler Tools Interface (CUPTI) for CUDA Toolkit 11.0 is now available CUPTI – CUDA Profiler Tools Interface	4	742	September 23, 2020
Profiling cuda graph with CUPTI Profiling API CUPTI – CUDA Profiler Tools Interface tensorrt	2	1058	March 13, 2024
CUDA Profiler Tools Interface (CUPTI) for CUDA Toolkit 12.8 is now available CUPTI – CUDA Profiler Tools Interface	4	172	March 2, 2025
How to profile multiple tensorrt model inference simultaneously using CUPTI CUPTI – CUDA Profiler Tools Interface tensorrt , profiling	6	825	June 13, 2023
CUDA Profiler Tools Interface (CUPTI) for CUDA Toolkit 12.0 is now available CUPTI – CUDA Profiler Tools Interface	5	1596	October 23, 2024
CUDA Profiler Tools Interface (CUPTI) for CUDA Toolkit 11.7 is now available CUPTI – CUDA Profiler Tools Interface	5	1270	October 23, 2024
Collecting events and metrics with CUPTI from a separate process? CUDA Programming and Performance	2	495	July 6, 2017

CUPTI activity API and child processes

Related topics