What does gld_transaction mean in nvprof metrics?

Hi, everyone.

I wanted to test texture cache line size on Pascal GPU, so I wrote some simple code to read a 1D array to register and write to another array.

__global__ void cacheLineTest(const float* src_, float* des_, unsigned int stride){
  int tid = blockIdx.x*blockDim.x+threadIdx.x*stride;
  des_[tid] = src_[tid];
}

and I got following results.
External Media

I have three questions:

  1. From these results, can I tell that 1D texture cache line of Pascal is 32 bytes?
  2. What does gld_transaction really mean? Why gld_transaction differ from L2_tex_read_transaction when stride is 1 and 3.
  3. I neither use restrict nor ldg(), why my load request still went through Texture + L2 path.

Thanks

Could anyone from nvidia tell us how to calculate gld_transaction?

Thanks

Hello jiazhe,

Sorry for the delay, I’m also not quite sure the details of each metrics, could you tell me which cuda version you use and I can raise a bug for dev to answer you.

Best Regards
Harryz

Hi Harryz!
Thanks for your reply!I am using the newest CUDA 8.0, thanks!

Best,
Zhe

Hello jiazhe,

Could you attach your full source code? As you said src_ should be sampler1D, dst_ should be global memory, right?

Best Regards

Hi harryz_,
Sorry for the late reply. The code is pretty sample.
It’s just moving elements of one array from global mem to another array from global mem.
If something is wrong below, please correct me.

int main(){
  std::cout<<"*********************Cache line Test*********************"<<std::endl;

  int blockSize = 32;
  int gridSize = 1;
  int stride = 9;
  
  unsigned int size = blockSize*gridSize*stride;
  float * A_cpu = (float*)malloc(size*sizeof(float));
  float * B_cpu = (float*)malloc(size*sizeof(float));
  float * A_gpu,*B_gpu;
  
  cudaMalloc(&A_gpu,size*sizeof(float));
  cudaMalloc(&B_gpu,size*sizeof(float));
  
  cacheLineTest<<<gridSize,blockSize,0,0>>> (A_gpu,B_gpu,stride);
  
  free(A_cpu);
  free(B_cpu);
  cudaFree(A_gpu);
  cudaFree(B_gpu);
}

To run code, I use command below.

nvprof -m gld_transactions,dram_read_transactions,l2_read_transactions,l2_tex_read_transactions,local_load_transactions,local_load_transactions,  ./cacheLineTest

Thank you.

Hello jiazhe,

Looks like the same issue described in https://devtalk.nvidia.com/default/topic/941880/visual-profiler/global-load-transaction-count-when-in-coalesced-memory-access/ , actually I’ve raised a bug for dev to check, your request also will be added into the exist bug.

Best Regards

Hi harryz_,
Could you please let us know if you have any update from nvidia?
Thanks.

Sure thing.