how to use large data (some MB) in CUDA efficiently

Hello everyone,
i’ve started some months ago to use CUDA for accelerating applications. I have the following problem:
I have some large arrays (1D or 2D with size from KB till MB) which should be bradcasted to all threads of my kernel, i mean each thread need to read the same data elements from those arrays to perform some calculations.
I was thinking about tranfering the array on the GPU so i don’t have to read all time from the host side. I wanted to define them as constant but since the constant memory is limited on the GPU it will not work.
The application itself is a bit complicated because it uses recursion function calls and other concept so wanted to avoid the use of shared memory.
can somebody help me with an efficient concept for the use of large data on GPU than using global memory?
It would be very helpful for me.

Thanks in advance


a couple of ideas come to mind

  1. Linear memory bound to a texture reference or texture object

  2. pitch linear memory bound to a texture reference or texture object if 2D indexed access is desired
    Also allows bilinear interpolation between data elements in hardware, if so desired.

  3. read access via __ldg() or const restrict pointers.

1), 2) and 3) could be combined with reading portions of the data to shared memory on a per block basis, and then accessing it from all of the block’s threads following a __syncthreads() directive. Not sure how much speed would be gained from that.

Hi cbuchner1,
thank you for your advices. I was also thinking about using shared Memory, but to be honest i am not Feeling really confident with CUDA yet and i was afraid to have some Kind of bank conflits or other errors and slow my program. I will give it a try and if i still have difficulties i will use textures.
I would like to ask These other questions:
1. can i combine the use of constant and shared Memory to improve the Performance, i mean use constant Memory for Holding variables needed in all thread
2. Is it a Problem to have recursiv call of a function in CUDA
3. Is bank conflcts (if happends) drastically reducing the Performance of my application

Thnak you for your Answer
best regards