Visual Profiler shows random reads and writes from Device Memory

Hi, I wrote a cuda code to access the memory in the device memory and i run in it a loop, incrementing each value in the process.

global void dram_load(float *a)
for (int i=0;i<1200;++i)
a[i*32]= a[i*32]+1;


When i profile it, i get the required numbers of L2 reads and writes but the Device memory reads are completely off. It shows reads somewhere between 0-50 everytime i rerun the analysis, also the write counts are double the amount.

The results of one such analysis are:

Reads 1292
writes 1209
Total 2501

Device Memory
Reads 30
Writes 2400
Total 2430

Could anyone explain it to me why it is happening like this? Whay are there no reads from the device memory. Once they are in cache, i understand but initially there should be data being fetched, right? And why are the writes like this as well.

Any insight would be appreciated.


I found the same problem in my Profiler. The number of Device memory changes. I issue cudaDeviceReset() at the end of my program. Can anyone explain this behavior? GPU : Quadro k5000 Profiler : Version: 8.0