How rto use __local correctly?


I got a simple meanshift clustering code running, but it takes a lot of time to complete, so I needed to split the workload into several enqueue call to avoid the driver to be restarted.
I heard and read that it is possible to use the __local parameter directive to load the data to be processed into a faster memory.

How do I do that? Simply changing __global to __local causes the program not to run.

A link would be already enough, thanks!

Sounds like some fundamentals would be useful. has some fantastic video tutorials.

I thought including shared memory optimizations would be advanced stuff :-D

Thank you, the tutorials look fine!

Yeah but you can’t just change it from global to local and think everything’s going to be fine.

That’s a bit like saying I’ll put petrol in my diesel engine and my car will just go faster.

To get more juice out of your kernel you’ll need to look at how you are using the local memory to avoid bank conflicts and such like.

The tutorials should be a big help in getting up to speed with all that.