Nvidia CUDA profiler is not able to profile certain code

Using this nvidia blog , it seems that the nvprof profiler is unable to profile unoptimized_cuda.cpp

Any advices ?

Hi feiphung,

Can you please provide the nvprof command you tried and what is the console output? Are you able to profile other CUDA samples on the same setup?

Please provide details about the setup - CUDA toolkit version, GPU etc.

All the commands are executed in https://colab.research.google.com/drive/1V94-WC6Jf8M4Tj8iX3x5CuW93qSfuG6A?usp=sharing

Tesla P100, CUDA 10.1

nvprof --print-gpu-trace python train.py


Can you please try with nvprof options --profile-child-processes or --profile-all-processes?

$nvprof --print-gpu-trace --profile-all-processes python train.py

======== Error: nvprof doesn't accept application argument when "--profile-all-processes" is specified.
======== Use "nvprof --help" to get more information.

using --profile-child-processes does not give any runtime error, however it only gives normal printout output, nothing related to profiler output…

Sorry I forgot to mention that with --profile-all-processes option, application should be launched on another terminal.

Sample commands:
terminal 1:
$ nvprof --profile-all-processes -o output.%p

terminal 2:
$ python train.py