Hi,
I am wondering if there are any command line switch(es) that will give me more details in why it died when using ‘nsys’ in the following command:
$ nsys profile ./deviceQuery
Agent launcher failed.
$
Running ‘deviceQuery’ without ‘nsys’ works fine.
$ ./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
<snip>
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.0, CUDA Runtime Version = 12.0, NumDevs = 1
Result = PASS
Running the following command works:
$ nsys status -e
Timestamp counter supported: Yes
CPU Profiling Environment Check
Root privilege: disabled
Linux Kernel Paranoid Level = 2
Linux Distribution = gentoo
Linux Kernel Version = 6.1.12-gentoo: OK
Linux perf_event_open syscall available: OK
Sampling trigger event available: OK
Intel(c) Last Branch Record support: Available
CPU Profiling Environment (process-tree): OK
CPU Profiling Environment (system-wide): Fail
See the product documentation at https://docs.nvidia.com/nsight-systems for more information,
including information on how to set the Linux Kernel Paranoid Level
Based on the last command working, I suspect it missing something, for the previous command, but I do not know what. Therefore, I am looking for a command line option that will tell me what is the actual problem is so I can fix it.
regards,
Jonathan
@rknight can you take a look at this?
Hi jonathan.wells.research,
By default, the nsys CLI enables the following options if no options are provided;
–trace=cuda,nvtx,opengl,osrt --sample=process-tree --cpuctxsw=process-tree
Therefore,
nsys profile ./deviceQuery
is equivalent to
nsys profile --trace=cuda,nvtx,opengl,osrt --sample=process-tree --cpuctxsw=process-tree ./deviceQuery
To determine what is causing the issue, you could start with generating an empty profile and then add options back in. For example, try the following commands;
nsys profile --trace=none --sample=none --cpuctxsw=none ./deviceQuery
then add one data type at a time until the issue occurs.
nsys profile --trace=none --sample=none --cpuctxsw=process-tree ./deviceQuery
nsys profile --trace=none --sample=process-tree --cpuctxsw=process-tree ./deviceQuery
nsys profile --trace=cuda --sample=process-tree --cpuctxsw=process-tree ./deviceQuery
nsys profile --trace=cuda,nvtx --sample=process-tree --cpuctxsw=process-tree ./deviceQuery
etc.
Hi rknight,
Thank you for your suggestion.
This is the output of the first command:
$ nsys profile --trace=none --sample=none --cpuctxsw=none ./deviceQuery
Agent launcher failed.
$
Also, I tried the following. I ran ‘nsys-ui’ and selected the ‘local connection’ then I got ‘Tool libraries installation failed …’. Clicking on ‘More info …’ gaves me the following:
AnalysisInternalError (4001) {
OriginalExceptionClass: N5boost10wrapexceptINS_16exception_detail39current_exception_std_exception_wrapperISt13runtime_errorEEEE
}
Trying this method didn’t gives me any clues in what I am missing.
regards,
Jonathan
Hi Jonathan,
Can you check if any nsys processes are still running using the
pgrep nsys
command? If you find any, kill them (e.g.
sudo pkill nsys
) and then try the
nsys profile --trace=none --sample=none --cpuctxsw=none ./deviceQuery
command again.
Hello Bob,
Here are the response to your request:
$ pgrep nsys
$ nsys profile --trace=none --sample=none --cpuctxsw=none ./deviceQuery
Agent launcher failed.
$ pgrep nsys
$
No other nsys processes are running before and after the nsys profile command
regards,
Jonathan
If you use an absolute path for nsys, does it still fail the same way?
Can you run?
nsys profile sleep1
Hello Bob,
Here is the response:
$ /opt/cuda/bin/nsys profile --trace=none --sample=none --cpuctxsw=none ./deviceQuery
Agent launcher failed.
$ nsys profile sleep 1
Agent launcher failed.
$ /opt/cuda/bin/nsys profile sleep 1
Agent launcher failed.
For a test, I deliberately misspell ‘profile’:
$ nsys profiled sleep 1
Unknown command: profiled
usage: nsys [--version] [--help] <command> [<args>] [application] [<application args>]
<snip>
To run a basic profiling session: nsys profile ./my-application
For more details see "Profiling from the CLI" at https://docs.nvidia.com/nsight-systems
$
The executable itself is working (e.g., all the require libraries are present). It appears that nsys is looking for something but it is giving no clue to what it is looking for.
regards,
Jonathan
What version of nsys are you using?
nsys --version
$ nsys --version
NVIDIA Nsight Systems version 2022.4.2.50-32196742v0
Can you try updating to the 2023.1 (latest) version to see if it has the same issue?
Ok, I will see if I can get that done today.
I forgot to mention that I had the same problem when using CUDA 11.8 series as well. Also, I did the upgrade from 11.8 series to 12.0.1 series within the past few weeks.
regards,
Jonathan
Hello Bob,
I have run out of time to sort out the installation of the new version today. It could be early next week before I am able to get back to this project (I work on a few different projects which GPU project is one of them).
regards,
Jonathan
OK, no problem. I sincerely appreciate your help in sorting out this issue.
Ok, I have it working using the latest version.
Updated drivers to 530.30.02 from 525.89.02.
Updated CUDA to12.1.0 from 12.0.1.
The drivers were automatically updated. However, for CUDA, I had to manually installed the files from cuda_12.1.0_530.30.02_linux.run (e.g., extract and copy the files manually – installer problems – that is for another time …). The previous two CUDA updates were done automatically.
Test run:
$ nsys --version
NVIDIA Nsight Systems version 2023.1.2.43-32377213v0
$
$ nsys profile sleep 1
Generating '/tmp/nsys-report-ccea.qdstrm'
[1/1] [========================100%] report2.nsys-rep
Generated:
/<snip>/report2.nsys-rep
‘nsys stats ’ also works. I can compile the ‘deviceQuery’ and run nsys profile on it using the latest version. I did one further test which is in rebuilding and profiling (nsys) my projects and that is working without any problems (at the moment!).
At this point, I haven’t had a chance to find out exactly why the previous two updates failed. I have a busy week so I am not sure when I can find exactly where the problem is. It would be very helpful if nsys actually report what the problem is instead of posting ‘Agent launcher failed’ as that tells me absolutely nothing in what the problem is (e.g., I might not have installed a library that it is depended on.).
regards,
Jonathan