I Need a solution about CUDA Threads, Warp and Block

i have an undergraduate project. my project is to pair DNA sequence using CUDA. when i wrote the code, i have a problem with my matrix. when row 33, its cant to reach the value of row 32. i think it about the warp size or something else. my question is how to use warp 32x32 with big data? or how to solve that situation? Sorry, my english is’nt good. Thank you…

best if you provide more information and/ or code/ code snippets