I got a simple meanshift clustering code running, but it takes a lot of time to complete, so I needed to split the workload into several enqueue call to avoid the driver to be restarted.
I heard and read that it is possible to use the __local parameter directive to load the data to be processed into a faster memory.
How do I do that? Simply changing __global to __local causes the program not to run.
A link would be already enough, thanks!