FDTD boundary lose effect in share mem help

Hi , is anyone familiar with EM simulation on CUDA?

A 1-D FDTD was implemented in CUDA, and runs well.
But when I try to ues share memory, some easy boundary
coundition E(0)=0, E(Nz)=0. seems to lose effect of reflecting wave back.
(main kernel below)

int idx = blockIdx.x*blockDim.x+threadIdx.x;
if ( idx < Nz) Eout[idx+1] = s_E[idx+1]- (s_H[idx+1]-s_H[idx]);
Eout[1] = 0;
Eout[Nz]= 0;

You need to update the H field also and repeat the process many times.

yes my code also include H field and time loop.

And the code runs well with globle memory version.

Boundary problems comes only when share memory is used.

So I’m just confused about when to set boundary condition?

Before or after data transfered to share memory?