cache data in shared memory for subsequent calls

Ben_Jiang · May 14, 2010, 9:05pm

I have this app which I beleive it’s best to cache data in shared memory for many many subsequent calls. The scenario is:

the cache data never changes
It will then have a long running process (loop forever) which just repeatly invoke the compute function from user input

Would someone verify if the below code will work? I am interested to know if s_data will hold the right data across the first kernel call and the subsequential kernel calls…

Thanks in advance. Ben

The code follow:==========================

extern __shared__ s_data[];

__gloabl__ kernelLoadData(datapool){

		// copy data into s_data

	....

	threadIdx.x ...

}

__gloabl__ kernelCompute(signal){

	// access s_data and compute it with signal...

	....

}

void main(){

	// load data from disk:

	float* datapool;

	...

	// compute blockDim and thread count:

	dim3 n_block, block_size;

	// load once to each block:

	kernelLoadData<<<n_block, block_size>>>(datapool);

	float *signal;

	while(true){

		// wait for input

		....

		signal = ....	// some input func

		// compute from the input:

		kernelCompute<<<n_block, block_size>>>(signal);

		// read back from computed data...

	}

}

tmurray · May 14, 2010, 9:16pm

The contents of shared memory at the beginning of any kernel call are undefined, so no, that will not work.

Jimmy_Pettersson · May 14, 2010, 11:24pm

Perhaps you would want to do copy the data into constant memory space… If all threads of the warp are accessing the same element this might be a good option for you.

In your function main you just do a cudaMemcopyToSymbol(…) to a constant variable that has been declared in the global scope.

constant float read_only_array[length];

Ben_Jiang · May 25, 2010, 5:58pm

Thanks, tmurray and Jimmy.

I have finished the work;). Anyway, what I used was: load the data into global memory and stay cached there. The data will then be read into shared memory for faster access. Both shared memory or constant memory are too small, since I have at least 20MB of float numbers. I’d need 16 GTX470 to load all of them in shared or constant memory;).

The bad news is I cannot cache the 20 MB data in shared memory, since each float will only be used once.

Anyhow, it works. Still way faster than CPU;), roughly by 20 times (40s->2s).

Ben_Jiang · May 25, 2010, 5:58pm

Thanks, tmurray and Jimmy.

I have finished the work;). Anyway, what I used was: load the data into global memory and stay cached there. The data will then be read into shared memory for faster access. Both shared memory or constant memory are too small, since I have at least 20MB of float numbers. I’d need 16 GTX470 to load all of them in shared or constant memory;).

The bad news is I cannot cache the 20 MB data in shared memory, since each float will only be used once.

Anyhow, it works. Still way faster than CPU;), roughly by 20 times (40s->2s).

Topic		Replies	Views
Copying data from global memory to shared memory by each thread CUDA Programming and Performance	6	17006	January 7, 2022
CUDA: Using shared memory between different kernels.. CUDA Programming and Performance	4	16318	July 21, 2017
shared memory computation CUDA Programming and Performance	0	2075	September 30, 2010
Getting access to shared memory from different kernels is there a way to share it? CUDA Programming and Performance	4	3751	May 13, 2009
Use CUDA Shared memory as a write buffer CUDA Programming and Performance	8	3212	May 9, 2015
Copying a few floating numbers to the Shared memory CUDA Programming and Performance	6	334	March 4, 2021
General Shared Memory Question CUDA Programming and Performance	5	6626	March 4, 2010
Efficiently loading data in the shared memory CUDA Programming and Performance	0	341	February 15, 2021
How to save a big data(4M, larger than constant memory) wihch is frequently used by every thread lik CUDA Programming and Performance	4	778	October 26, 2013
Shared Memory Persistence CUDA Programming and Performance	17	2225	November 25, 2020

cache data in shared memory for subsequent calls

Related topics