I am trying to collect roofline plot for CUDA Fortran application.
I am using following command:
srun --gres=gpu:4 --ntasks=2 --tasks-per-node=2 --hint=nomultithread ncu --target-processes all --set full --section SpeedOfLight_RooflineChart -f --csv -o profile_data.csv ./senga2_f2c_mpi_cuda -OPS_DIAGS=2 OPS_FORCE_DECOMP_X=2 OPS_FORCE_DECOMP_Y=1 2>&1 | tee log_256X256X256_2ranks_1node_cuda_profile.txt
this was taking very long time so i provided kernel name to gather information for specific kernel only with following
-k dfbydx_kernel_main
I have function by this name “dfbydx_kernel_main”, but then it says “No kernels were profiled”.
Can you please help me how i can provide the kernel name correctly.
also how i can provide multiple kernel names at a time to gather information for few kernels with same run.