I am trying to profile an application that asynchronously launches CUDA kernels on the GPU. But the profiling fails with the following error
==PROF== Profiling “potrf_alg2_set_info” - 1: 0%
==WARNING== Backing up device memory in system memory. Kernel replay might be slow. Consider using “–replay-mode application” to avoid memory save-and-restore.
==WARNING== Backing up device memory in system memory. Kernel replay might be slow. Consider using “–replay-mode application” to avoid memory save-and-restore.
…50%…100% - 73 passes
==PROF== Profiling “potrf_alg2_cta_upper” - 2: 0%…50%…100% - 71 passes
==ERROR== LaunchFailed
==ERROR== LaunchFailed
==PROF== Trying to shutdown target application
==ERROR== The application returned an error code (9).
==ERROR== An error occurred while trying to profile.
==PROF== Report: /home/mannaparambil/dplasma/build/profile.ncu-rep
Hello Joseph,
Thank you for your question on Nsight and I’m sorry you ran into this problem. I just want to clarify which Nsight product are you using. Are you using Nsight Graphics or a different Nsight product such as Nsight systems or Nsight Compute?
Regards,
Nsight Compute stores and restores kernel state in memory in order to replay the kernel multiple times. That can double the memory footprint. To avoid this you can switch to application replay with “–replay-mode application”. This avoids the memory storage from needing to replay. Let me know if that solves your issue.