Hi, I have a problem with an unexplained problem, results in the array on the edges are broken (prints 0). Im not sure whats wrong. Variable ‘a’ is a array filled 0-1 from the host. In the program are a lot of time steps and the kernel is executed many times. Somebody can explain what is maybe wrong in kernel function?
My array
|--------|
| i | n | *2
|--------|
| n | *2 + 1
__global__ void GPU_CalculateTemp(float *a, float *b) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
//Patterns
//Main loop
if ((i > 0) && (i < N )) {
//Collision
float fl = a[i * 2];
float fr = a[i * 2 + 1];
float rho = fl + fr;
float feq = 0.5 * rho;
//Streaming on the left and right from i
b[(i - 1) * 2] = omega * feq + (1 - omega) * fl;
b[(i + 1) * 2 + 1] = omega * feq + (1 - omega) * fr;
}
}
Kernel
void compute(int rank, float **device_a, float **device_b) {
int threadsperblock = N; // static size #define N 100
int blockspergrid = 1;
size_t size = N*sizeof(float);
calculateTemp<<<blockspergrid,threadsperblock>>>(*device_a, *device_b);
cudaMemcpy(*device_a, *device_b, size, cudaMemcpyDeviceToDevice); //after this device_a is copied to host array and print
}
My left side array, second and third positions are 0:
0.05 0.00 0.10 0.00 0.25 0.11 0.23 0.14 0.37