how to use large data (some MB) in CUDA efficiently

fabricembianda8978 · December 4, 2017, 9:21pm

Hello everyone,
i’ve started some months ago to use CUDA for accelerating applications. I have the following problem:
I have some large arrays (1D or 2D with size from KB till MB) which should be bradcasted to all threads of my kernel, i mean each thread need to read the same data elements from those arrays to perform some calculations.
I was thinking about tranfering the array on the GPU so i don’t have to read all time from the host side. I wanted to define them as constant but since the constant memory is limited on the GPU it will not work.
The application itself is a bit complicated because it uses recursion function calls and other concept so wanted to avoid the use of shared memory.
can somebody help me with an efficient concept for the use of large data on GPU than using global memory?
It would be very helpful for me.

Thanks in advance

Fabrice

cbuchner1 · December 4, 2017, 11:09pm

a couple of ideas come to mind

Linear memory bound to a texture reference or texture object
pitch linear memory bound to a texture reference or texture object if 2D indexed access is desired
Also allows bilinear interpolation between data elements in hardware, if so desired.
read access via __ldg() or const restrict pointers.

1), 2) and 3) could be combined with reading portions of the data to shared memory on a per block basis, and then accessing it from all of the block’s threads following a __syncthreads() directive. Not sure how much speed would be gained from that.

fabricembianda8978 · December 5, 2017, 7:41am

Hi cbuchner1,
thank you for your advices. I was also thinking about using shared Memory, but to be honest i am not Feeling really confident with CUDA yet and i was afraid to have some Kind of bank conflits or other errors and slow my program. I will give it a try and if i still have difficulties i will use textures.
I would like to ask These other questions:
1. can i combine the use of constant and shared Memory to improve the Performance, i mean use constant Memory for Holding variables needed in all thread
2. Is it a Problem to have recursiv call of a function in CUDA
3. Is bank conflcts (if happends) drastically reducing the Performance of my application

Thnak you for your Answer
best regards

Fabrice

Topic		Replies	Views
How to save a big data(4M, larger than constant memory) wihch is frequently used by every thread lik CUDA Programming and Performance	4	850	October 26, 2013
Use of constant caches for large data? CUDA Programming and Performance	10	27108	February 23, 2007
Texture Memory / Large Data / Global Memory Advice CUDA Programming and Performance	14	10893	May 18, 2010
Whether use shared memory? CUDA Programming and Performance	8	4594	April 15, 2008
Advice sought on data transfers between memory CUDA Programming and Performance	0	884	September 5, 2008
large input value sets Legacy PGI Compilers	3	4019	February 8, 2013
constant vs shared memory CUDA Programming and Performance	2	23436	February 23, 2007
Constant Arrays CUDA Programming and Performance	13	30819	November 24, 2007
Small const array accessable globally? Is it easy and possible? CUDA Programming and Performance	6	1536	April 16, 2009
Should I use shared, constant or texture memory for this application? CUDA Programming and Performance	2	330	June 10, 2023

how to use large data (some MB) in CUDA efficiently

Related topics