Could you suggest some ideas to improve my kernel's performance?

I have a kernel which runs not as fast as expected. Nsight compute gave these profiling info. Could anybody help with improving it. In particular, how to address the Low Utilization issue in compute workload analysis? Thanks

It seems you forgot to post the kernel code and its launch configuration as well as details about the GPU you are running it on. What have you tried to improve kernel performance?

Just generally, the LSU seems to be the bottleneck, you can improve shared memory accesses and loads and stores, not only for memory, but also in regards to number of instructions.

Thank you for the comments.

It turned out that my kernel’s computational intensity is too low ( too little multiplications given the amount of smem load). The performance is much better now by increasing computations per thread.