Performance drop

global void TextureReadKernel(uint32_t** pcreTbl, uint32_t* pcreTblSize, uchar* packet, PacketInfo* pkInfo, char* pcreRes)
{
unsigned char ch = 0;
uint32_t matched = false;
uint32_t size = 0;
uint32_t base = pkInfo[threadIdx.x].base;
size = pkInfo[threadIdx.x].size;

/* TODO : LOOP unrolling */
for (uint32_t i = 0; i < size; i++) {
ch = tex1Dfetch(texRef, base + i);
matched ^= ch;
}
pcreRes[threadIdx.x + blockIdx.x * blockDim.x] = matched;
}

This is the kernel that I used texture memory.
But this kernel performance drops when the grid dimension is 17.
When grid dimension is 16, performance was 7.4 gpbs, but the grid dimension is 17 then the performance become 4.4 gbps

Could you let me know why it is?