Is optimization possible for this kernel?

In a kernel of my code, I load the data from global memory and store in shared memory, then use it perform some operations.
The following image is the memory work load analysis:

The speed of light of the same kernel is:

I want to know if the optimization is still possible? because the speed of light hit the top but Nsight compute says 85% od speedup can be expected. Here is the screenshot of it:


Please help me how to go through with this problem.