Is the 2^27 width limit on linear memory in bytes or in elements?
I have a 1638416384 matrix and I use cudaMalloc ( (void **)d_C, 1638416384sizeof(float) ) to get a linear memory and then use cudaBindTexture(&offsetC, texRefC, d_C) and cudaBindTexture(&offsetC, texRefC, d_C+163848192) to bind. Does the caching in the former one also contain the latter one. My question is how CUDA decides to which point of the memory address it would stop caching? Does my case exceed the width limit?
Can I revise data in the memory area that has already been binded to a texture?
It depends on what you give for the size parameter to the bind. Texture fetches will return zero for reads past the end of the indicated length. If that is chosen such that the two areas overlap, then yes - reading from either reference will be cached.
Of course, with one caveat. You must allow the kernel that performs the writes to complete before you can expect a tex fetch from to read the modified data.
where indeed there is a parameter indicating the length of the caching. I wonder what is the difference between these two? And if I use the former one, how could the compiler know how much data should be cached?