Why does the source code index++; get compiled into two instructions?

There is a for loop in my program that executes many times, and the loop body contains code similar to result_buf[index++] = followed by some calculation.

uint32_t *output;
uint32_t data[80];
uint32_t index = 0;
//Init data array, then
for (uint32_t i = 0; i < 80; i++) {
       if (data[i] != 0){
                output[index++] = Result of some calculation;
       }
}

The corresponding SASS code for the source code is as follows:

00007f6b e729a000	@P4   IADD3 R89, R8, 0x1, RZ 
00007f6b e729a010	@P4   IMAD.IADD R7, R7, 0x1, R90 
00007f6b e729a020	@P0   LOP3.LUT R90, R102, 0xf, RZ, 0xc0, !PT 
00007f6b e729a030	@P3   IMAD R87, R9, 0x10, R104 
00007f6b e729a040	@P4   IMAD.MOV.U32 R8, RZ, RZ, R89 

The first and last lines of SASS code correspond to the operation index++ . Why is index++ not compiled into a single instruction that directly increments the value in the source register, but instead writes the incremented result to another register and then copies it back to the source register?

CUDA architecture: sm_86
Cuda toolkit version: Cuda compilation tools, release 12.3, V12.3.107

Could someone please answer my question?

You only show part of the code, and especially not the part, where the index is used.

The registers are accessed by two ports. One for even, one for odd register numbers.
Both ports have a reuse cache for recently used registers.
Changing registers leads to a fixed-latency delay, until the result can be used by other instructions.
The loop could be (partly) unrolled.

The combination of those effects can lead to the effect that using two registers is the better choice.

It can also be a suboptimal optimization of ptxas.