Is these way coalesced access?

note “When a warp executes an instruction that accesses global memory, it coalesces the memory accesses of the threads within the warp into one or more of these memory transactions”.

but I have some questions.
1.

__global__ void add(double *a. double *b){
 int i = blockDim.x * blockIdx.x + threadIdx.x;
i = 3 * i;
b[i] = a[i] + a[i + 1] + a[i + 2];
}

can the three accesses(a[i] , a[i + 1] , a[i + 2]) executed with only an instruction? (I mean that is it coalesced access?)
or does the coalesced only exist in the different thread(transverse) of a warp?(no exist in a thread?)

__global__ void add(double *a. double *b){
 int i = blockDim.x * blockIdx.x + threadIdx.x;
b[i] = a[i] + a[i + 10] + a[i + 12];//assuming no out of indeax
}

It may can be the non-coalesced access.
so I change the code to:

__global__ void add(double *a. double *b){
 int i = blockDim.x * blockIdx.x + threadIdx.x;
__shared__ double shareM[3*BLOCK_SIZE]; 
shareM[threadIdx.x] = a[i];
shareM[threadIdx.x + 1] = a[i + 10];
shareM[threadIdx.x + 2] = a[i + 12];
b[i] = shareM[threadIdx.x] + shareM[threadIdx.x + 1] + shareM[threadIdx.x + 2];
}

I write the data to the shared memory from global, then read out. can this way avoid the non-coalesced access for improving the performance?
[/code]

Thank you very much.