Sorry for the delay, I’m also not quite sure the details of each metrics, could you tell me which cuda version you use and I can raise a bug for dev to answer you.
Hi harryz_,
Sorry for the late reply. The code is pretty sample.
It’s just moving elements of one array from global mem to another array from global mem.
If something is wrong below, please correct me.
int main(){
std::cout<<"*********************Cache line Test*********************"<<std::endl;
int blockSize = 32;
int gridSize = 1;
int stride = 9;
unsigned int size = blockSize*gridSize*stride;
float * A_cpu = (float*)malloc(size*sizeof(float));
float * B_cpu = (float*)malloc(size*sizeof(float));
float * A_gpu,*B_gpu;
cudaMalloc(&A_gpu,size*sizeof(float));
cudaMalloc(&B_gpu,size*sizeof(float));
cacheLineTest<<<gridSize,blockSize,0,0>>> (A_gpu,B_gpu,stride);
free(A_cpu);
free(B_cpu);
cudaFree(A_gpu);
cudaFree(B_gpu);
}