Understanding CPU and GPU Behavior with NVIDIA Visual Profiler

I am trying to deepen my understanding of CPU and GPU behavior using the NVIDIA Visual Profiler.

There are a few things I would like to know about the NVIDIA Visual Profiler:

  1. What is the range of time supported by NVIDIA Visual Profiler? For example, is it in seconds, milliseconds, microseconds, or nanoseconds?

  2. The attached image is a screenshot from running a simple CUDA program.[1]

2-1. When horizontal bars for cudaMalloc or cudaMemory appear, is the CPU in a run state, or is it just waiting?

2-2. Are MemCpy(HtoD) mean and MemCpy(DtoH) mean the actual data transfers or other?

[1]The simple CUDA program which I used is the “mult.cu” found on the following web page: 第6回 GPU の仕組みと PyTorch 入門 / 真面目なプログラマのためのディープラーニング入門

  • Sorry, this web page is Japanese.

What is the range of time supported by NVIDIA Visual Profiler? For example, is it in seconds, milliseconds, microseconds, or nanoseconds?

Time units are shown at all places. For ex - in the timeline, it is shown at the top bar. If you zoom in/out, unit might change between s (sec), ms (millisecond), us (microsecond) etc.

2-1. When horizontal bars for cudaMalloc or cudaMemory appear, is the CPU in a run state, or is it just waiting?

For CUDA APIs, these bars represent the entire duration of the API, starting from when CUDA starts processing it to when it finishes. It is not necessary that CPU is busy all the time during this duration.

2-2. Are MemCpy(HtoD) mean and MemCpy(DtoH) mean the actual data transfers or other?

Yes, activities shown under the CUDA device and context represent those activities which are executed on the CUDA device. MemCpy trace represent data transfers.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.