Say that I have two different arrays that will be sent to the device:
__global__ void kernel1(int *array1, int *array2)
And I access each array’s elements the following way:
int a1 = array1[threadIdx.x];
int a2 = array2[threadIdx.x];
Are the accesses being coalesced in a compute capability 1.1 device? I believe so, as each Kth thread accesses the Kth element and there’s is no misalignment while accessing each array…