Texture call within for loop in Subroutine


I recently received code from a colleague that implements software PCF with a 5x5 kernel within the scope of a Shadow Algorithm Demo. I copied that function and tried to incorporate it in a OpenGL data generator that should generate image data with different rendering techniques for deep learning. So I implemented different subroutines in my shader for specular and diffuse illumination models, as well as shadow techniques. so the 5x5_PCF is supposed to be another implementation of the shadow technique subroutines. So, I have a sampler2D and if I sample it inside the loop, as one would do to implement software pcf, I end up with this image:

so, despite the sanity check to render everything unshaded, the entire region of the shadow map is shaded no matter what. This doesn’t change either, if I actually implement PCF behaviour. However, If I perform another sanity check and break right after (I went through all samples and they produce similar images), I end up with this image:

So in general, it appears to work unless removing the break. If I do a manual loop unrollment, it works as expected and I could implement PCF like that, however even a 5x5 is already quite cumbersome.

Does anybody have an Idea why this might fail?

I’m using a RTX 3090 with the latest drivers (516.94). The implementation of my colleague without subroutines seems to work flawlessly, however, it would require heaps of work to change the architecture again and I can’t think of any API-dependent reason, why this shouldn’t be possible that way (or at least not that I’d be aware of such a limitation)

Kind regards and thank you,