could somebody help me anlysis this result

I do a experinment about the SM execute 2 instreuctions. Iuse a vector add to test, the total threads is 512. For one instruction case, I make all 512 threads to do one work and for 2 instruction case, I make first 256 threads to do first work and another 256 threads to do second work
the code for one instruction:
while (id < n) {
for (k = 0; k < 18000; k++)
c[id] = a[id] + b[id];
id += gridDim.x*blockDim.x;

the coda for 2 instructions likes following

while (id < n) {
		for (k = 0; k < 18000; k++)
		{
			if (id < 256){
				c[id] = a[id] + b[id];
			}
			else {

				u[id] = a[id] + b[id];
			}
			
		}

we can see in two instructions case, the 2 instruction is same.
Then I use the cuda profiler to test the execution time of kernel
the 2 instruction: 13ms and 1 instruction is 11.8ms
and I want to know why the one instruction case has less execution time? I assume this 2 cases should have same execution tiem because in 2 in structions cae the 2 instructions are same
I am so confused , could anybody give me some help