I’m working on chasing down a weird bug in my CUDA Fortran code that triggers an out of bound memory access in places that are accessing local memory that should be in registers. In the process of hunting this down I’ve been using cuobjdump --dump-resource-usage to look for overused registers but it has a lot of other info and I’m not sure what is reasonable or not for those other values.
The output from cuobjdump --dump-resource-usage is below. The shared memory usage is what I expect but the register usage is lower than I would expect and I have no idea how to interpret the rest. I’ve looked but can’t find any documentation on what the other values mean, what reasonably values are, etc. Any advice?
Fatbin elf code:
================
arch = sm_90
code version = [1,7]
host = linux
compile_size = 64bit
has debug info
compressed
identifier = ../gpu/interface_states.cuf
Resource usage:
Common:
GLOBAL:43
Function gpu_interface_states_magnitude_squared_:
REG:0 STACK:0 SHARED:0 LOCAL:0 TEXTURE:0 SURFACE:0 SAMPLER:0
Function gpu_interface_states_compute_pressure_:
REG:0 STACK:0 SHARED:0 LOCAL:0 TEXTURE:0 SURFACE:0 SAMPLER:0
Function __cuda_sm20_div_rn_f64_full:
REG:0 STACK:0 SHARED:0 LOCAL:0 TEXTURE:0 SURFACE:0 SAMPLER:0
Function gpu_interface_states_compute_energy_:
REG:0 STACK:0 SHARED:0 LOCAL:0 TEXTURE:0 SURFACE:0 SAMPLER:0
Function gpu_interface_states_sound_speed_:
REG:0 STACK:0 SHARED:0 LOCAL:0 TEXTURE:0 SURFACE:0 SAMPLER:0
Function __cuda_sm20_dsqrt_rn_f64_mediumpath_v1:
REG:0 STACK:0 SHARED:0 LOCAL:0 TEXTURE:0 SURFACE:0 SAMPLER:0
Function gpu_interface_states_slope_minmod_:
REG:0 STACK:0 SHARED:0 LOCAL:0 TEXTURE:0 SURFACE:0 SAMPLER:0
Function gpu_interface_states_conserved_2_primitive_:
REG:0 STACK:0 SHARED:0 LOCAL:0 TEXTURE:0 SURFACE:0 SAMPLER:0
Function gpu_interface_states_primitive_2_conserved_:
REG:0 STACK:0 SHARED:0 LOCAL:0 TEXTURE:0 SURFACE:0 SAMPLER:0
Function gpu_interface_states_index_1dto3d_:
REG:0 STACK:0 SHARED:0 LOCAL:0 TEXTURE:0 SURFACE:0 SAMPLER:0
Function gpu_interface_states_subgrid_conserved_2_primitive_:
REG:0 STACK:0 SHARED:0 LOCAL:0 TEXTURE:0 SURFACE:0 SAMPLER:0
Function gpu_interface_states_trace_3d_:
REG:0 STACK:0 SHARED:0 LOCAL:0 TEXTURE:0 SURFACE:0 SAMPLER:0
Function gpu_interface_states_hll_flux_:
REG:0 STACK:0 SHARED:0 LOCAL:0 TEXTURE:0 SURFACE:0 SAMPLER:0
Function gpu_interface_states_hll_fluxes_:
REG:0 STACK:0 SHARED:0 LOCAL:0 TEXTURE:0 SURFACE:0 SAMPLER:0
Function gpu_interface_states_riemann_driver_:
REG:0 STACK:0 SHARED:0 LOCAL:0 TEXTURE:0 SURFACE:0 SAMPLER:0
Function gpu_interface_states_conservative_update_:
REG:0 STACK:0 SHARED:0 LOCAL:0 TEXTURE:0 SURFACE:0 SAMPLER:0
Function gpu_interface_states_compute_interface_states_:
REG:64 STACK:1032 SHARED:40760 LOCAL:0 CONSTANT[0]:568 TEXTURE:0 SURFACE:0 SAMPLER:0
Fatbin ptx code:
================
arch = sm_90
code version = [8,4]
host = linux
compile_size = 64bit
has debug info
compressed
identifier = ../gpu/interface_states.cuf
ptxasOptions = -g --dont-merge-basicblocks --return-at-end