I am in the process of starting to write parts of a physics engine as a possible thesis work. I am thinking of picking programming language.
The problem is I want to implement parts (Not decided which parts, but candidates are narrowphase, boradphase collision detection or constraint solver) of the physics engine in CUDA to speedup the calculations.
It seems CUDA uses a C++ like language with NVIDIA extensions.
My question is:
Is it possible to write parts of the engine in c++ (because of the object oriented support) and implement the parallell computation part(s) in CUDA’s programming language?
Or is the solution to write my whole engine in CUDA’s programming language (and use structures as a classes)?
It seems that object oriented programming is not supported in CUDA’s language. How to solve this? Because writing a physics engine is complex and OOP features are good I think.
Seems CUDA also support OpenCL. Actually is there any reason not to use it since OpenCL also support other graphics cards. Is there any performance differences?
CUDA apps’ code is split into two parts: host and device. Host code can be as OO as you want as it basically gets compiled by a C++ compiler of your choosing (through nvcc). Device code, the one that actually hits the GPU, is not OO. It’s basically C (not C++) with extensions. Technically, it’s the C subset of C++ with GPU-specific extensions - you get some C++ features (like templates and the ‘class’ keyword is actually legal) but you don’t get virtual functions, inheritance and such. It’s been hinted that Fermi will allow C++ on the GPU with all OO features but right now it’s impossible.
By the way, if you intend to use OO-heavy code for your engine, you might run into performance issues even on the CPU. Resolving virtual functions, finding concrete implementations of abstract methods, creates some overhead and it can get quite noticeable if it happens in low level stuff (collision checking, raycasting).
My advice is to tailor your low level algorithms to non-OO programming (or at least minimize abstractness , use templates instead of polymorphism etc.). This goes for both the GPU and for CPU.
OpenCL is also C and even more restricted than CUDA C (no templates for example). Pure C, not “C subset of C++”. Also, all current implementations are beta versions. But yeah, the benefit is that it should work on various hardware. Mind you, you might still need to design algorithms differently for AMD and NVIDIA cards (ex. AMD uses vectors a lot, CUDA is scalar).