Will this loop cause warp divergence?

In my CUDA kernel I have the following for loop where each thread in a warp begins at a different index. That index then rolls over back to the first index if it goes beyond 32. Will looping like this cause warp divergence?

Note I only have one warp per block so thread id is same as the lane in the warp.

tid = ThreadIdx.x
for j in tid:(tid + WARP_SIZE)
        wrapped_j_idx = (j - 1) & (WARP_SIZE - 1) #modulo
        val = foo()
        #No race conditions as threads in warp execute in step
        forces[tid + offset, :] += val
        forces[wrapped_j_idx, :] -= val
end