after checking the generated PTX file, the address of array element is calculated using MUL instruction and the sequence looks as follows:

mov.u32 $r0, (&A); # load array base

mul.lo.u32 $r2, $r1, 4; # calculate the index for unsigned int array

mul.lo.u32 $r2, $r1, 16; # calculate the index for uint4 array

add.u32 $r3, $r2, $r0;

ld.global.v1.u32 [$r3+0], $r4; # load unsigned int

ld.global.v4.u32 [$r3+0], {$r4, $5, $6, $7}; # load uint4

The programming guide states that MUL takes 8 cycles compared to 2 cycles of regular integer operations. Is it possible to direct the compiler generate the SHL instead of MUL to calculate the index?

- DB