Nsight Compute not reporting/profiling all kernels profiled by Nsight Systems

Hi, I have been trying to profile a couple applications built on top of libraries such as AMReX and SUNDIALS. When I profile the application through Nsight Systems, I get a timeline view with all the relevant kernels, including those with calls originating in the application code. When I profile the same application using Nsight Compute for a more detailed understanding, I notice that only the lowest level submodule kernels have been profiled, i.e., application kernels calling the submodule kernels don’t show up in Nsight Compute. It is almost as if the parent calls are not being profiled?

I have set the profiling options to ncu --set full --kernel-id :::2 --replay-mode application --target-processes all --print-kernel-base mangled. I set print kernel names to mangled in case the profiler was only reporting on lambda templates, but that didn’t help either. I would imagine every kernel/call captured by Nsight Systems would also be profiled by Nsight Compute. What could I be missing?

Hi, @asterix_obelix

Can you please get the kernel name from Nsight System and then specify the name for --kernel-name in Nsight Compute command line ?
Please refer ncu --help for detailed usage.

I tried that, and I end up with no kernels profiled
==PROF== Creating report from application replay data: 0%.==WARNING== No kernels were profiled.

Thanks for the info. Is it possible to provide us a mini-repro ?
Also I will check internally with Nsight System and Nsight Compute dev to see if any specific reason can cause this.

I can share details about how to reproduce it. The code is open-source, but I am not a core developer of the code. So I am not sure I can develop a mini-app to reproduce it. There is a regression test that works on a single A100 GPU, but it requires compiling the entire code which is straightforward using cmake. Please let me know if you would like for me to share the details.

Yes, please. Reproduce internally will help us understand the issue better.

// Checkout the code from GitHub
git clone --recursive https://github.com/AMReX-Combustion/PeleC.git
cd PeleC/Submodules/PelePhysics
git submodule init
git submodule update

// Set chemistry model and compile it
cd PeleC
mkdir build
Set Chemistry Model in PeleC/Exec/RegTests/PMF/CMakeLists.txt

-DCMAKE_CXX_FLAGS="-g --save-temps" \
-DCMAKE_CUDA_FLAGS="-g --save-temps -lineinfo" ..

make PeleC-PMF -j

// Run the regression test - Note the big run command, please copy as is other than changing the path to the executable
cd PeleC/Exec/RegTests/PMF
Run command on slurm -
srun -N1 -n1 -c1 --threads-per-core=1 ncu --target-processes all --replay-mode application -f -o pmf_a100 --print-kernel-base mangled <path to PeleC>/build/Exec/RegTests/PMF/PeleC-PMF pmf-dodecane.inp geometry.prob_lo=0.0 0.0 0.0 geometry.prob_hi=3.2 3.2 1.6 amr.n_cell=128 128 64 prob.L=0.4 0.4 1.6 amr.plot_files_output=0 amr.plot_int=10 amr.checkpoint_files_output=0 amrex.abort_on_out_of_gpu_memory=1 pelec.cfl=0.1 pelec.init_shrink=1.0 pelec.change_max=1.0 amrex.the_arena_is_managed=0 pelec.chem_integrator=ReactorCvode cvode.solve_type=GMRES amr.blocking_factor=16 amr.max_grid_size=64 pelec.use_typ_vals_chem=1 ode.rtol=1e-4 ode.atol=1e-5 pelec.typical_rhoY_val_min=1e-6

Please let me know if you run into any issues running the above. If you switch ncu with nsys, I observe the difference in kernels profiled

@veraj Wanted to let you know that I figured out the issue. When using --kernel-id, kernels are filtered by function names. The libraries that I was using launch parallelFor loops under the same function name. Using --kernel-base-name demangled fixed the issue for me.

Thanks for letting me know this!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.