Under-researched topics of CUDA regarding GPGPU programming

Currently looking for new concepts/under-researched topics for my thesis regarding CUDA for GPGPU programming. Maybe some compiler optimization suggestions or any task-specific optimization strategies.

What I’ve considered doing is analysis and comparison of different GPU memory-oriented approaches for, say, parallel reduction problem.

However there are plenty of those already and I would like to contribute to the GPGPU programming field with a research of some modern, yet unresearched, concept.

I believe there should be handful of such research ideas, which experienced CUDA developers came across.

You might want to specify what type of thesis this is. A master’s thesis? A doctoral thesis? Roughly how many person-hours are you planning to invest in this?

Yes, forgot to mention, thanks.
It’s a bachelor’s thesis. Have roughly month to go, so about 200 hours.