Currently making an n-body simulator to learn CUDA. I had modified my implementation to that the CPU would write data that’s read only to the GPU to the constant cache instead of global memory, which spend up my implementation by about x25. However, I now want a kernel to access this constant memory using a device pointer instead of the array symbol. Doing this led to performance returning to the previous performance, as if the kernel is now treating the memory like it were global. Is there a way to tell the kernel that a pointer points to constant memory to prevent this?
Currently I’m passing a bool to chose to choose between one of two constant buffers in the kernel, but I’d prefer to take the check out of the loop if possible.