I have been developing a protein folding simulation using CUDA, and have some questions about using the constants memory (and cache) vs the texture memory (and cache).
Specifically, I have a large amount of reference data which is required during the calculation process, which I have been able to get down to around 60 KB via very aggressive compression techniques. However, the access pattern in this data is random for the input data (due to the compressed nature), primarily consisting of reading single float values from nonadjacent memory locations. Is this pattern of access more likely to gain a benefit from the constant cache, or the 1d texture cache, or a balance across both (assuming the 8kb caches are seperate for each memory region?)?
Also, how much of the 64 KB constant cache is actually accessible? I find that when I use > 50 KB, that I will get inconsistent failures at launching the CUDA kernel, which can be reproduced just by a noop kernel which includes a large __constant array.
-Sean