GLSL compiler bug passing inout mat4 to a function in compute shader

The following compute shader puts incorrect (1, 2, 3) values to the last column of the output matrix:

#version 450

layout(std140, binding = 4) restrict writeonly buffer Output { mat3x4 outMatrix[]; };

void translate(inout mat4 m, in vec3 v)
{
	m[3].xyz += v;
}

mat4 run()
{
	vec3 d = vec3(1,2,3);
	mat4 result = mat4(1.0);

//	result[3].xyz += d;
	translate(result, d);

	while (true) {
		if (true) {
//			result[3].xyz += d;
			translate(result, d);
			break;
		}
	}

	return result;
}

layout(local_size_x = 1) in;

void main() {
	mat4 outMat = transpose(run());
	outMatrix[gl_LocalInvocationID.x] = mat3x4(outMat[0], outMat[1], outMat[2]);
}

If we replace translate(result, d); with equivalent result[3].xyz += d; the problem goes away and the last column is set to (2, 4, 6).

This is a minimal program that reproduces the bug:

It prints xyz of the last column of the matrix two times. The first line corresponds to the shader with translate(result, d); and the second one is for result[3].xyz += d;. It compiles easily with CMake on both Windows and Linux.

On GeForce RTX 2060 with 566.36 driver on both Windows and Linux it prints (last two lines):
1 2 3
2 4 6

Also tried to launch on Intel HD Graphics 2500, AMD Vega 7, AMD Vega 8 and Mesa llvmpipe. All of them print
2 4 6
2 4 6
as expected.

Looks very similar to this bug, but I could not reproduce it. It may have been fixed in some cases, but not in the one I encountered.

Here’s the disassembly(?) of the bugy shader obtained from glGetProgramBinary on GeForce 2060:

OPTION NV_internal;
OPTION NV_shader_storage_buffer;
OPTION NV_bindless_texture;
GROUP_SIZE 1;
STORAGE sbo_buf0[] = { program.storage[0] };
TEMP R0;
TEMP T;
TEMP RC;
SHORT TEMP HC;
REP.S ;
SEQ.U.CC HC.x, {1, 0, 0, 0}, {0, 0, 0, 0};
BRK (NE.x);
MOV.U.CC RC.x, {1, 0, 0, 0};
BRK (NE.x);
ENDREP;
MUL.S R0.x, invocation.localid, {48, 0, 0, 0};
MOV.S R0.x, R0;
STB.F32X4 {1, 0, 0, 0}.xyyx, sbo_buf0[R0.x];
STB.F32X4 {0, 1, 2, 0}.xyxz, sbo_buf0[R0.x + 16];
STB.F32X4 {0, 1, 3, 0}.xxyz, sbo_buf0[R0.x + 32];
END

It looks like it is finally storing incorrectly precomputed values in the output buffer.