warp divergence results from a situation where you have conditional code, and some threads in the warp follow one path through the conditional code, and other threads in the warp follow a different path.
The boundary check referred to in your prompt is an example:
int idx = threadIdx.x + blockDim.x*blockIdx.x;
if (idx < N)
z[idx] = x[idx] + y[idx];
For threads whose
idx value (the globally-unique thread index) is less than
N, they will perform the vector addtion. Those threads whose
idx value is equal to or greater than
N will not. Typically there would be one warp in the grid where some threads would satisfy that boolean condition, and some threads would not. All other warps in the grid would either consist of all threads satisfying the condition, or all threads not satisfying the condition. The warp that has some of both is the candidate for warp divergence.
The concept of warp divergence has a number of possible interpretations and nuances that I am not covering here. This is a basic definition that is suitable for initial understanding and solution of the question asked.