Optimal using of memory.

Let’s suppose i declared memory this way:
shared MyType array[BLOCK_SIZE].
Is calling memory like this array[idx] is as efficent as doing assigment MyType var = array[idx] and then using it?
I know that variables in registers are faster but, isn’t once used element of shared pulled to register too.
Can someone explain if I am thinking right and why?

Indeed I’d expect the compiler to optimize both variants to the same code.

Indeed I’d expect the compiler to optimize both variants to the same code.

Assuming that MyType is for example pair of floats. Is it better for some reason to use 2 arrays of floats instead of array of MyType?

Assuming that MyType is for example pair of floats. Is it better for some reason to use 2 arrays of floats instead of array of MyType?

Yes - depending on your access patterns. If your warps access consecutive array elements, a single array of float2s will induce a 2-way bank conflict, while two arrays of floats will not lead to bank conflicts.

Yes - depending on your access patterns. If your warps access consecutive array elements, a single array of float2s will induce a 2-way bank conflict, while two arrays of floats will not lead to bank conflicts.

I know that reading from shared by >2 threads causes conflicts but I don’t exactly understand why when you read array of float2s consecutively it happens. Could you explain it?

I know that reading from shared by >2 threads causes conflicts but I don’t exactly understand why when you read array of float2s consecutively it happens. Could you explain it?