Any advices ?
Can you please provide the nvprof command you tried and what is the console output? Are you able to profile other CUDA samples on the same setup?
Please provide details about the setup - CUDA toolkit version, GPU etc.
All the commands are executed in https://colab.research.google.com/drive/1V94-WC6Jf8M4Tj8iX3x5CuW93qSfuG6A?usp=sharing
Tesla P100, CUDA 10.1
nvprof --print-gpu-trace python train.py
Can you please try with nvprof options --profile-child-processes or --profile-all-processes?
$nvprof --print-gpu-trace --profile-all-processes python train.py
======== Error: nvprof doesn't accept application argument when "--profile-all-processes" is specified. ======== Use "nvprof --help" to get more information.
--profile-child-processes does not give any runtime error, however it only gives normal printout output, nothing related to profiler output…
Sorry I forgot to mention that with --profile-all-processes option, application should be launched on another terminal.
$ nvprof --profile-all-processes -o output.%p
$ python train.py