CUDA JIT compiler

What optimizations and transformation does CUDA JIT compiler do?

At least register allocation and instruction scheduling. Beyond that you would have to run your own experiments to find out. See this thread: http://forums.nvidia.com/index.php?showtopic=169246