Are mem access and computation asynchronous?

Gimurk · November 16, 2008, 2:13pm

For example, there is a kernel whose code is as following, and three warps are invoked on a SM.

[codebox]global void test(int *a, int *c)

{

__shared__ int sdata[96];

int tid = threadIdx.x;

// memory load from global mem to shared mem

sdata[tid] = a[tid];

__syncthreads();

// computation

for(int i = 0; i < 10; i++)

	c[tid] += sdata[tid]*i;

......

}[/codebox]

I have two concepts needed to be verified.

1 No matter there is a __syncthreads() or not between mem loading and computation, computation must start after mem loading is completely finished in the procedure of a warp when the kernel is invoked, and __syncthreads() is used to make sure that any computation of a warp starts after mem loading of all warps is finished.

2 Hardware of mem transfer and ALU work at the same time as possible as they can, namely, while computation of a warp is executed on ALU and transfer channel is not occupied, mem loading of another warp can be issued and performed.

Is that true? I think so, but not sure about it.