RTX 5070 Ti nvgpucomp64.dll exception thrown when shader uses GL_NV_shader_sm_builtins

I have following compute shader:

#version 460 core

#extension GL_NV_shader_sm_builtins : require

layout(set = 0, binding = 0) uniform writeonly image2D image;

layout(push_constant) uniform Settings
{
    layout(offset = 0) uint width;
    layout(offset = 4) uint height;
}
settings;

layout(local_size_x = 32, local_size_y = 32, local_size_z = 1) in;
void main()
{
    const uvec3 xyz = gl_GlobalInvocationID;

    const uint width  = settings.width;
    const uint height = settings.height;

    const uint x = xyz.x;
    const uint y = xyz.y;

    if (x >= width)
    {
        return;
    }

    if (y >= height)
    {
        return;
    }

    const uint smid = gl_SMIDNV;
    const uint sm_count = gl_SMCountNV;

    const uint warp_id = gl_WarpIDNV;
    const uint warps_per_sm = gl_WarpsPerSMNV;    

    const float sm_vis   = float(smid) / float(sm_count - 1U);
    const float warp_vis = float(warp_id) / float(warps_per_sm - 1U);

    imageStore(image, ivec2(x, y), vec4(sm_vis, warp_vis, 0, 0));
}

When I run it on RTX 5070 Ti sometimes it runs producing invalid values but most of the time it crashes and I get error:

Exception thrown at 0x00007FFECCF90769 (nvgpucomp64.dll) in app.exe: 0xC0000005: Access violation writing location 0x000002B5D418BDD4.

When it does not crash and produces wrong output only first image channel is affected (one that sm_vis is written to). I menaged to run app under NSight Graphics and when it does not crash second channel (one with warp_vis) is all filled with zeros while first has values ranging from 0 to 1 but when I change imageStore(image, ivec2(x, y), vec4(sm_vis, warp_vis, 0, 0)); to imageStore(image, ivec2(x, y), vec4(warp_vis, sm_vis, 0, 0)); only second channel has correct values (greenish output) and first one is full with zeros.

When i only use 1 channel app never crashes and produces correct result (either visualizing warps or SMs).

Backtrace shows that it comes from vkCreateComputePipelines and after that call stack trace looks like:

Same shader works perfectly fine on RTX3060.