Strange performance across CUDA versions

Forgive me if this is the wrong place to ask this question. I have the same code compiled twice with the pgi compiler. One is with -Mcuda=8.0 and the other is just with -Mcuda and CUDA/10.1 loaded up. However the second one runs slower. Am I misunderstanding how those flags work? I was under the impression that the -Mcuda version would use the most recent available CUDA which would be 10.1 which would presumably run faster, this however isn’t the case as it’s much slower.

I don’t think you’re misunderstanding anything. Occasionally moving from one CUDA version to another causes a speed up or slow down for the same code. I don’t think I would be able to speculate what is happening in your case.

Oh, okay. I just wanted to make sure I hadn’t done something wrong. If that’s just something that can happen I’ll try and sort it out myself.