I have an array which contains 48 elements with each element of size 4 bytes. The array is read-only for both the host and the device.
For the purpose of optimization, i thought i should declare the array as global constant, but according to the cuda documentation
“the constant cache is best when threads in the same warp accesses only a few distinct locations. If all threads of a warp access the same location, then constant memory can be as fast as a register access”
But in my case every thread in a warp will access random element of the array, so most probably the cache-miss will occur.
The same array is used by both the host function and the device kernel.
- In which manner can i declare this read-only array so that only one definition is used by both (host and device).
Because currently i have two definition of the same lookup table, one for the device in , say kernel.cu; and the other for the host in host.c
- Is their any other approach besides constant which can somehow benefit me in the performance gain in both (host and device).