Constants vs Texture Memory

I have been developing a protein folding simulation using CUDA, and have some questions about using the constants memory (and cache) vs the texture memory (and cache).

Specifically, I have a large amount of reference data which is required during the calculation process, which I have been able to get down to around 60 KB via very aggressive compression techniques. However, the access pattern in this data is random for the input data (due to the compressed nature), primarily consisting of reading single float values from nonadjacent memory locations. Is this pattern of access more likely to gain a benefit from the constant cache, or the 1d texture cache, or a balance across both (assuming the 8kb caches are seperate for each memory region?)?

Also, how much of the 64 KB constant cache is actually accessible? I find that when I use > 50 KB, that I will get inconsistent failures at launching the CUDA kernel, which can be reproduced just by a noop kernel which includes a large __constant array.


It sounds like the constant cache is probably your best bet since you have less than 60KB of data. Texture is a good possibility also. 2D Texture will help if you have (or you can create) good 2D locality in the addressing.

As for the issue with the constant arrays. Can you provide more information? Are you using cudaMemcpyToSymbol to download data into the constant array?


At present, it is very difficult to create 2d locality, as the data tables are highly compressed representations of a much larger matrix, which necessarily loses locality with the compaction. However, the relative cost of indexing into the original (8 MB +) data, given the latencies of global memory access seemed to be worth the tradeoff of unpacking a dataset that could fit within a faster memory region…

I was loading the static constant data by defining a static constant table via an include. For example:

constant const int d_nLRLen = 4;

constant const int d_nLRIndex[21][21] = {

{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }, (rest removed)

as defined in a .cu file that was #included in the main kernel .cu file. There are no samples in the SDK which utilize the __constant keyword, so if I should be using MemcpyToSymbol instead, I’ll take a look at that. Changing the size of the matrix included in this fashion will enable the application to run correctly, with the only variable being the size of the constant matrix. I provided a full sample file via NVIDIA bug ID 288638, since I don’t seem to have permission to attach a .cu file to a post.


One other followup clarification, are the per MPP group 8kb cache’s for constants and textures seperate memory pools (in which case an application could gain a benefit from using both memory regions)?


Yes, they are separate.