Hi,
I am using a gpu with the below configuration:
CL_DEVICE_NAME: Tesla M2050
CL_DEVICE_VENDOR: NVIDIA Corporation
CL_DRIVER_VERSION: 285.05.33
CL_DEVICE_VERSION: OpenCL 1.1 CUDA
CL_DEVICE_OPENCL_C_VERSION: OpenCL C 1.1
CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU
And I wanted to know if the shared memory for a work group survives for the next iteration of the same kernel.
For example,
If the kernel1 uses a local memory for example local1 and if I write some data in to it for the first iteration. Would the data of the local memory be still there for the next iterations ? or is it initialized again for each kernel invocation ?
for (ii=0;ii<100;ii++) {
err = 0;
err = clEnqueueNDRangeKernel(commands, kernel1, 1, NULL, globalws_2d_1, localws_2d_1, 0, NULL, NULL);
}
I hope my question was clear, if not pls let me know.
Regards,
Hi
Shared memory is stored on the ‘shared-memory’ HW. Each SM (in Fermi) or SMX (in Kepler) has its own shared-memory that is used by the thread-blocks that run on it. Once you re-run a kernel, it is not guaranteed that a specific block will run on the same SM as before, so I can’t see how it can use the shared-memory from the previous run.
So my guess is that the shared memory does not survive.
Regards
shared memory only survives between work-groups. once your work-group execution has completed, there’s no guarantee that it will survive. However, there’s a way to copy data from global memory to local memory before your kernel execution begins, and push it back afterwards:
event_t async_work_group_copy ( __local gentype *dst,
const __global gentype *src,
size_t num_gentypes,
event_t event)
event_t async_work_group_copy ( __global gentype *dst,
const __local gentype *src,
size_t num_gentypes,
event_t event)
more info : OpenCL 1.1 Reference Pages
So, my advice would be: allocate global memory to store your data, copy it at the beginning of your kernel to local memory, and write your local memory back to global memory at the end