hi, I’m having some problem with the following code. taking time of run, sometimes it runs in 10 ms, some in 0,1 ms, and some else it cause a “invalid configuration argument”. i run this kernel with 256 threads per block and (16001200sizeof(float)/256) block (30’000 block). the focus of kernel is to translate each byte of Src in a sequence of bit. so, cod variable is 256*80 and length is 256. to encode bit to bit, i must use bit operations, so i set a mask and use OR an AND operation to set the specific bit. the next step will be to copy the result on the array Dst, but first i must know why the kernel runs in such different way.
thank you all, A.
__global__ void Encode(float *Src,unsigned char *Dst,bool *cod, unsigned int *length){
const int tid=threadIdx.x;
const int bid=blockIdx.x*blockDim.x;
unsigned char mycod[10];
unsigned char mask;
unsigned char *c=(unsigned char*)Src;
unsigned char curr=c[tid+bid];
unsigned int len=length[curr];
//#pragma unroll
for(int i=0;i<len;i++){
mask=1;
if(cod[i+80*curr]==1){
mask <<= i%8;
mycod[i%8] |=mask;
}
else{
mask <<= i%8;
mask= ~mask;
mycod[i%8] &=mask;
}
}
}