This used to be in the feature request list.
Today, I find SpeedOfLight_HierarchicalHalfRooflineChart in the list from ncu --list-sections
.
When I add it to the ncu command line, I get no output, even though my kernels use fp16 exclusively.
$ sudo GIB_APIS="cu,cl" /usr/local/cuda/bin/ncu --section SpeedOfLight_HierarchicalHalfRooflineChart ./testgib
numboxes in aab file: 69
CUDA version: 12010
==PROF== Connected to process 4385 (/home/bram/src/GIB/viewer/testgib)
CUDA driver version: 12010
Device Number: 0
Device name: NVIDIA GeForce RTX 3070
Memory Clock Rate (KHz): 7001000
Memory Bus Width (bits): 256
Peak Memory Bandwidth (GB/s): 448.064000
Max threads per block: 1024
Warp size: 32
Picked device with 7970 MiB of memory.
Loading module: kernels-cu/raybox2m.ptx
c_aab = 7f8397200000 bytes=12288
c_mtr = 7f8397203000 bytes=4096
c_tof = 7f8397204000 bytes=24576
c_ltr = 7f839720a000 bytes=4096
c_lco = 7f839720b000 bytes=2048
Read 2097152 hemi dirs.
Read 2097152 parallel field positions.
Read 2097152 omni dirs.
WRN scene_p32 0x7f837e000000
WRN scene_p16 0x7f83b40dd040
==PROF== Profiling "gib_raytest" - 0: 0%....50%....100% - 4 passes
==PROF== Profiling "gib_bouncetest" - 1: 0%....50%....100% - 4 passes
==PROF== Profiling "gib_bin_photons" - 2: 0%....50%....100% - 4 passes
Cuda client has been shut down.
==PROF== Disconnected from process 4385
[4385] testgib@127.0.0.1
gib_raytest (16384, 1, 1)x(128, 1, 1), Context 1, Stream 13, Device 0, CC 8.6
gib_bouncetest (16384, 1, 1)x(128, 1, 1), Context 1, Stream 13, Device 0, CC 8.6
gib_bin_photons (32768, 1, 1)x(128, 1, 1), Context 1, Stream 13, Device 0, CC 8.6
$
I do not see fp16 roofline in the graphical ncu-ui profiler either.
UPDATE: I can get in the graphical profiler if I select the correct section.
So… the section made it to the list, but the functionality behind it, did not?
Or am I using it incorrectly?