__syncthreads() in a loop is fine as long as every single thread in the block performs all iterations of the loop. Keep in mind that to avoid race conditions in loops, you may need __syncthreads() both before and after access to shared memory.
I made work !
performance may suck but at least i wont fail the course…
still better than calling the kernel for each i