i test a problem ,
data size is “size”
int threadsPerBlock =256;
int blocksPerGrid =(size + threadsPerBlock - 1) / threadsPerBlock;//
dim3 cudaBlockSize(threadsPerBlock,1,1);
dim3 cudaGridSize(blocksPerGrid , 1, 1);
caul_dis<<<cudaBlockSize,cudaGridSize>>>(d_datax,d_datay,d_dataz, d_out,d_indx,d_begin,d_end,size,minx, maxx, miny, maxy, minz,maxz,d_gridnm,disd,com_dis);
for example if data size is 1200, then in grid_cau function ,when it has for loop all the data d_out went wrong.
if data size is 12.
and threadsPerBlock = 12 and blocksPerGrid = 1 and so on ,in this situation the result won’t be wrong with for loop .