Visual profiler point me tat my kernel is memory bound. What does it mean and how to optimize it?
I think there are too many memory transfers from the gpu ram. You might consider recalculating some variables or store them in the shared memory and registers instead of saving them in the global memory.
It means that performance of your code is limited by its memory accesses, not by computing. Look at these sections in the Best Practices Guide: