hi recently i have written a opencl, in which there was many number of global memory access after using local memory to reduce global memory access , the speed was increased a lot.
is there any other technique that can increase speed actually using private memory was really down fall of speed in my example
i am planning to increase it much more can i get unroll loop examples of NVIDIA any where.
and how to handle these float4 things in the if loops for example
please do help me with this
Thanks in advance.