/* TODO : LOOP unrolling */
for (uint32_t i = 0; i < size; i++) {
ch = tex1Dfetch(texRef, base + i);
matched ^= ch;
}
pcreRes[threadIdx.x + blockIdx.x * blockDim.x] = matched;
}
This is the kernel that I used texture memory.
But this kernel performance drops when the grid dimension is 17.
When grid dimension is 16, performance was 7.4 gpbs, but the grid dimension is 17 then the performance become 4.4 gbps