Hi Nvidia’s people.
I have a question for you.
I am using a Jetson Tx2.
Why, when i am running a p_chase like algorithm on local memory and global memory, i got two different results ?
Indeed, on local window the cache block size appears to be 4 bytes, and on global window it appears to be 32 bytes.
Maybe, it’s because the coalescing mechanism ? or possibly, since Pascal L1 cache is split into two different region, maybe one is for local data and the other for global data (in extension with a separated constant cache and separated shared memory )?