CUDA apps’ code is split into two parts: host and device. Host code can be as OO as you want as it basically gets compiled by a C++ compiler of your choosing (through nvcc). Device code, the one that actually hits the GPU, is not OO. It’s basically C (not C++) with extensions. Technically, it’s the C subset of C++ with GPU-specific extensions - you get some C++ features (like templates and the ‘class’ keyword is actually legal) but you don’t get virtual functions, inheritance and such. It’s been hinted that Fermi will allow C++ on the GPU with all OO features but right now it’s impossible.
By the way, if you intend to use OO-heavy code for your engine, you might run into performance issues even on the CPU. Resolving virtual functions, finding concrete implementations of abstract methods, creates some overhead and it can get quite noticeable if it happens in low level stuff (collision checking, raycasting).
My advice is to tailor your low level algorithms to non-OO programming (or at least minimize abstractness , use templates instead of polymorphism etc.). This goes for both the GPU and for CPU.
OpenCL is also C and even more restricted than CUDA C (no templates for example). Pure C, not “C subset of C++”. Also, all current implementations are beta versions. But yeah, the benefit is that it should work on various hardware. Mind you, you might still need to design algorithms differently for AMD and NVIDIA cards (ex. AMD uses vectors a lot, CUDA is scalar).