Lookup table in global memory Can I coallesce accesses?

Hey guys

I have an interested problem… in global memory I store a precalculated table of integers. The table is made such that table’s accesses return indexes to future accesses, in an iterative manner:

int table;

int curr_idx = threadIdx.x;

int next_idx = table[curr_idx];

// which i can then use as:

int new_next_idx = table[next_idx];


The table is rather large, so it needs to sit in global memory. Table values are essentially random, and it takes 50 clock cycles to generate each entry of the table.

Now, I’d like to somehow use this table in a CUDA kernel, but due to the 600 cycle access latency for global memory it would be more efficient to recalculate these values for each thread unless I can coallesce access. However due to the semirandom nature of the table accesses I don’t think i can do this.

Any thoughts?

I don’t think you will be able to coalesce access. Especially being so easy to compute, I think you are right, that recalculating will be much better. Unless there is some way you can squeeze it into shared memory.

If you’re lucky, you could benefit from texture caching.

I have the same problem. However in may case it showed to be better to recompute the indices. Even though that meant first to deconvolute the index of the current element which made my register usage explode. But now I can saturate my bandwidth with data to be computed on instead of indices.