Data streaming in 2D array, LBM

Hi, I have a problem with an unexplained problem, results in the array on the edges are broken (prints 0). Im not sure whats wrong. Variable ‘a’ is a array filled 0-1 from the host. In the program are a lot of time steps and the kernel is executed many times. Somebody can explain what is maybe wrong in kernel function?

My array

|--------|
| i |  n |   *2 
|--------|
    |  n |   *2 + 1
__global__ void GPU_CalculateTemp(float *a, float *b) {

	int i = blockIdx.x * blockDim.x + threadIdx.x;
	
        //Patterns

	//Main loop
	if ((i > 0) && (i < N )) {
                //Collision 
	        float fl = a[i * 2];
	        float fr = a[i * 2 + 1];
	        	        
	        float rho = fl + fr;
	        float feq = 0.5 * rho;
	        
	   	 //Streaming on the left and right from i       
	         b[(i - 1) * 2] = omega * feq + (1 - omega) *  fl;
	         b[(i + 1) * 2 + 1] = omega * feq + (1 - omega) * fr;   
        }       

}

Kernel

void compute(int rank, float **device_a, float **device_b) {
	int threadsperblock = N;  // static size #define N 100  		
	int blockspergrid = 1;			
	size_t size = N*sizeof(float);
	
	calculateTemp<<<blockspergrid,threadsperblock>>>(*device_a, *device_b);
	cudaMemcpy(*device_a, *device_b, size, cudaMemcpyDeviceToDevice); //after this device_a is copied to host array and print
}

My left side array, second and third positions are 0:

0.05	  0.00	  0.10	  0.00	  0.25	  0.11	  0.23	  0.14	  0.37

size_t size seems undersized, given you’re accessing up to element

b[(i + 1) * 2 + 1] for i ranging from 1 through N-1

so the last element written to is b[2*N-1]

Are you sure the arrays are allocated (and copied) with size 2Nsizeof(float) ?

size_t size = N*sizeof(float); seems to suggest otherwise.

Christian

My dear Christian, this was very helpful.