The main difference between global memory, texture memory, and constant memory is the path used to access it. All three of these types of memory live in the off-GPU, but on-board, DDR3 chips in your graphics card. There is no physical partitioning of your 512 MB of memory between global, texture and constant memory. (There is a limit of 64 KB of data in constant memory, but I don’t know if this is a physical limit, or some kind of addressing limit in the way constant memory is cached.)
What is different between the three is the hardware used to access it:
Global memory reads go direct to the memory controller, with no intervening cache. This is what happens when you do normal C-style reads from arrays allocated with cudaMalloc().
Constant memory reads go through the 8 kB per multiprocessor constant cache. If the word is found in the cache, it is immediately returned from there. Otherwise, the constant cache fetches the required data from global memory.
Texture memory reads go through the 6-8 kB per multiprocessor (size depends on specific GPU) texture cache. If the word is found in the cache, it is returned immediately, otherwise the texture cache fetches the required words from global memory.
The texture cache also has some other features. It does low-precision interpolation between array elements “for free”, and also understands how to read a specially packed multi-dimensional array format. This why 2D textures must be loaded onto the card using “CUDA Arrays” which are not organized in memory using the normal linear layout. Instead the elements are arranged into something like a Z-curve:
With the exception of 1D textures, the texture cache expects the data layout to be in this special CUDA Array order. The array copy you mention is just rearranging the elements to be in this order. It isn’t actually moving the data to a special segment of “texture memory.”