i need a cuda based program which run worst on GPu than Cpu… mens its cpu performance should good but if we not manage it on gpu (in terms of code )then it will give bad result than cpu that thing i want to prove.
Anything that can’t be parallelized ;)

Start with diffusion equation, soe simple molecular dynamics with only pair interactions.

actually i am looking program that can parallelis but if we do not manage threads propoerly then it gives worst op than cpu

Finite differences with L1 cache disabled. If you are in 3D you have each point loaded 9 times while one could get along with only 1. The N^2 algorithm (N particle , all interacting which each other) also has many loads and if you do not optimize it will be very slow. In fact the N^2 algorithm can be very fast implemented.