profiling CPU and CPU of multiple real-time tasks


I am working on a project which runs multiple real-time applications GPU.
I am wondering what tool I need to use to get trace information for both CPU and GPU for the multiple tasks.

Here are some detail explanation about the applications and test environment.

  • Test platform : PC+Geforce 1070 or Nvidia TX1 with Ubuntu
  • Test applications : 5~7 periodic applications.

    Each task has both CPU and GPU execution
    The GPU execution is implemented in using CUDA.
    All tasks operate periodically
    The tasks are assigned to real-time priority using schedtool and are pinning a specific CPU core.

The way to run the applications is like the following:
sudo taskset 0x1 schedtool -F -p 60 -e ./application_1 &
sudo taskset 0x1 schedtool -F -p 58 -e ./ application_2 &
sudo taskset 0x1 schedtool -F -p 58 -e ./ application_3

I need to analyze the time traces of both GPU and CPU for all tasks during their operation.

I guess that the Nsight can provide CPU+GPU trace.
Can Nsight work with multiple tasks at the same time?
Can the tool work with real time priority tasks?
If yes, can you recommend what links and document I need to refer to?
If not, what would be good way to visualize and analyze GPU and CPU traces using nvidia GPU?



You can use Visual Profiler to profiling. The tool is also embedded in the toolkit package.
Use details please refer

The muti-process profiling options are:
Profile child processes - If selected, profile all processes launched by the specified application.
Profile all processes - If selected, profile every CUDA process launched on the same system by the same user who launched nvprof. In this mode the Visual Profiler will launch nvprof and user needs to run his application in another terminal outside the Visual Profiler. User can exit this mode by pressing “Cancel” button on progress dialog in Visual Profiler to load the profile data
Profile current process only - If selected, only profile specified application.

CPU (host) options:

Profile execution on the CPU - If selected the CPU threads are sampled and data collected about the CPU performance is shown in the CPU Details View.
Enable OpenACC profiling - If selected and an OpenACC application is profiled, OpenACC activities will be recorded and displayed on a new OpenACC timeline. Collection of this data is only supported on Linux and PGI 15.7+. See the description of the OpenACC timeline in Timeline View for more information.
Enable CPU thread tracing - If enabled, selected CPU thread API calls will be recorded and displayed on a new thread API timeline. This currently includes the Pthread API, mutexes and condition variables. For performance reasons, only those API calls that influence concurrent execution are recorded and collection of this data is not supported on Windows. See the description of the thread timeline in Timeline View for more information. This option should be selected for dependency analysis of applications with multiple CPU threads using CUDA.

Hi, veraj,

Thank you for your comments.
I will take a look at the visual profiler and get back here soon.