I am looking for an application that can get a big speedup by reducing the register count and increasing the hardware occupancy. I have tried a few from the SDK but I only get around 5% speedup (compared to the unoptimized, higher register count version). I think the reason is that the applications I have tried are not bandwidth bound (i.e. they don’t have a lot of memory transactions). I want to get a measurement of the performance improvement one could get for this kind of applications by increasing the occupancy. I am hoping people in the forum can point me to a few applications like this.