Large performance delta between DirectX and Vulkan shaders on Windows

Curious if anyone has any insights on a performance delta I’m seeing between identical rendering on Vulkan and DX12. Shaders are all HLSL, compiled (optimized) using the latest drop of the dxc compiler. DXC is being run with these command-line options (both on DX12 and Vulkan, with the SPIR-V options added for Vulkan only, and defines, target profile etc excluded here for clarity):

-O3 -Qstrip_debug -spirv -fspv-target-env=vulkan1.2

The DCX output is then being passed to the SPIR-V optimizer, which is being configured with the code below:

spvtools::SpirvTools core(SPV_ENV_UNIVERSAL_1_6);
spvtools::Optimizer opt(SPV_ENV_UNIVERSAL_1_6);

auto printMessageToStdErr = [](spv_message_level_t, const char*,
                              const spv_position_t&, const char* m) 
    {
        std::cerr << "error: " << m << std::endl;
    };

core.SetMessageConsumer(printMessageToStdErr);
opt.SetMessageConsumer(printMessageToStdErr);

opt.RegisterPass(spvtools::CreateSetSpecConstantDefaultValuePass({ {1, "42"} }))
    .RegisterPass(spvtools::CreateFreezeSpecConstantValuePass())
    .RegisterPass(spvtools::CreateUnifyConstantPass())
    .RegisterPass(spvtools::CreateStripDebugInfoPass())
    .RegisterPass(spvtools::CreateStripReflectInfoPass());
opt.RegisterPerformancePasses();
if (!opt.Run(spirv.data(), spirv.size(), &optimizedSpirV))
{
    return false;
}

Profiling this application on an NVidia 3090RTX with latest drivers, I’m seeing 20-40% slower GPU performance on Vulkan (forward pass takes ~1.4-1.8ms on Vulkan, .9ms-1.2ms on DX12, for example).

Running NVidia NSight, I see (for an identical frame) big differences in instruction counts between Vulkan and DX for the same call (DX has 440 Floating-Point Math instructions vs Vulkan at 639, for example), so this does point to shader efficiency as being the primary culprit here. So is this just a case of driver optimizations being different between Vulkan and DX, or am I missing a step that would close the gap? Also worth noting the instruction count differences apply across compute shaders as well (which rules out differences in shader gen based on renderpasses etc).

So question - anyone have any insights on why there’s such a broad gap in performance between identical workloads? Am I just missing an obvious optimization step here for Vulkan, or is this just a difference in how the driver optimizes between DXIL and SPIR-V?