No gpu events, Failed to connect to the application

I seem to be having trouble getting Nsight Systems to trace cuda calls. I can launch the program and it runs, and seems to gather CPU side events, but gets nothing from the card. If I look in the diagnostics summary page I see:

"Failed to connect to the application. Has it been run with the Injection library?

I set up a long running execution and then examined the environment of the running process under /proc/<pid> and it looks like LD_PRELOAD is being populated (I’m assuming properly for setting up the injection?):

cat /proc/69520/environ
GRAPHICS_ROOT=/usr/libnvidia/screenQUADD_INJECTION_PROXY=OSRT,CUDAXAUTHORITY="/home/UNIXHOME/bbyington"/.XauthorityLD_PRELOAD=/home/UNIXHOME/bbyington/NsightSystems/NsightSystems-linux-public-2019.1.1.57-77caa23/Target-x86_64/x86_64/libToolsInjectionProxy64.soQUADD_CUDA_CONFIG=/tmp/injection_config_884859d3DISPLAY=:0LD_LIBRARY_PATH=/opt/intel/compilers_and_libraries_2017.4.196/linux/ipp/tools/intel64/perfsys:/opt/intel/compilers_and_libraries_2017.4.196/linux/debugger/ipt/intel64/lib:/opt/intel/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64:/opt/intel/mic/coi/host-linux-release/lib:/opt/intel/mic/myo/lib:/opt/intel/compilers_and_libraries_2017.4.196/linux/mpirt/lib/intel64:/opt/intel/compilers_and_libraries_2017.4.196/linux/ipp/lib/intel64:/opt/intel/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64:/opt/intel/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64/gcc4.4:/opt/rh/devtoolset-2/root/usr/lib64:/opt/intel/compilers_and_libraries_2017.4.196/linux/bin/intel64_mic:/opt/intel/compilers_and_libraries_2017.4.196/linux/bin/intel64:/opt/rh/devtoolset-6/root/usr/lib64:/opt/rh/devtoolset-6/root/usr/libPATH=/mnt/software/p/patchelf/0.10/bin:/mnt/software/c/cmake/3.9.0/bin:/opt/intel/compilers_and_libraries_2017.4.196/linux/bin/intel64:/opt/intel/compilers_and_libraries_2017.4.196/linux/mpirt/bin/intel64:/opt/intel/compilers_and_libraries_2017.4.196/linux/debugger/gdb/intel64_mic/bin:/opt/intel/compilers_and_libraries_2017.4.196/linux/debugger/gdb/intel64/bin:/opt/intel/compilers_and_libraries_2017.4.196/linux/bin/intel64_mic:/opt/intel/compilers_and_libraries_2017.4.196/linux/debugger/gui/intel64:/pbi/dept/primary/modules/sw/OpenHT/OpenHT-2.1.23/bin:/home/UNIXHOME/bbyington/bin:/usr/local/cuda/bin:/opt/rh/devtoolset-6/root/usr/bin:/usr/lib64/qt-3.3/bin:/opt/micron/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin

I have a Tesla V100 card plugged into a Centos 7.4 system with kernel 3.10.0-957 installed and nvidia driver 410.79. I have installed NVIDIA_Nsight_Systems_Linux_2019.1.1.57.

Unfortunately, attach to process is not supported for x86 Linux targets yet.

You can either launch the process (with delay if you want to avoid start up time) from the GUI or CLI, or you can use the interactive CLI to launch the process and only start/stop profiling when you explicitly call for it. If you need more information about using the interactive CLI, see https://docs.nvidia.com/nsight-systems/#nsight_systems/2019.1-x86/06-cli-profiling.htm%3FTocPath%3D_____6

Sorry, I may have left out details because I was assuming I was following the simplest garden path. I was in fact not trying to attach to a running process, the Nsight Systems application was in charge of launching the program. I was running with the gui and I was not even doing remote analysis or anything. All I did was:

  • Select the default device (there was a pre-configured entry with hostname of the local machine)
  • Set "Command line with arguments" to be the path to my binary (no arguments were needed)
  • Set working directory to be the directory containing the binary.
  • Select both "Collect OS runtime libraries trace" and "Collect CUDA trace"
  • Press the start button
  • If I do so the application runs, and several samples get collected before I tell it to stop. I can look at the “Files” section to examine the stdout and stderr streams and verify that my application ran without any issue. The timeline however presents me with information about what the CPU was doing, but nothing about what the GPU was doing. The only clue I have as to what maybe went wrong is the diagnostics summary page, which has the “Failed to connect to the application” message.
    Report.qdrep.zip (1.87 MB)

    Yup, that should have worked. Can you attach the .qdrep file (or give me a private message and we’ll figure out a way to get it to me) or is the work heavily proprietary?

    qdrp attached now to my previous post.

    Can you try running without OS Runtime trace turned on? (OSRT)?

    I think you are hitting a bug that we found with OSRT and CUDA on CentOS/RHEL. It is fixed in the next release, which should be out later this week, but you might be able to workaround it.

    Thanks so much for your quick responses on all this.

    I believe I did as you asked, but disabling the OS Runtime trace did not seem to allow me to gather any cuda trace information. I’ll attach another qdrp file to this message.
    Report2.qdrep.zip (1.27 MB)

    We’ve released a new version which should have a fix for your issue, you can download it from:

    https://developer.nvidia.com/gameworksdownload#?dn=nsight-systems-2019-2

    Let me know if that doesn’t fix it.

    (It is supposed to be hooked up to the developer.nvidia.com/nsight-systems main page sometime soon, but it is up and public, so I thought I would get you going).

    Perfect, that version seems to work for me. Thank you so much for your help!