I am new to using CUDA. I have a question about the amount of data transfer in gld_coherent loads that CUDA profiler shows.
CUDA profiler stats for one of my kernels is as follows:
method=[ _Z18gpu_forward_kernelPfS_S_PiS_S_iiiiii ] gputime=[ 320.864 ] cputime=[ 344.000 ] occupancy=[ 0.333 ] gld_coherent=[ 3000 ] gld_incoherent=[ 0 ] gst_coherent=[ 16 ] gst_incoherent=[ 0 ]
For the 3000 gld_coherent loads, how can I find how much data is being actually accessed from device DRAM? Any help is appreciated. Thanks.