I don’t think your analisys is exact on this matter: the programming model between OpenGL (both 2.0 or ES) and DX9/10 is quite different.
In GL you should think your process as a main loop instanciated by your app: for this reason the CPU usage is higher than a DX app (as the DX event loop is handled by another software layer, more connected to the underlying OS).
The advantage is you can handle your events with a your own event procedure. The drawback is you have to handle everything by yourself ! (uh… it seems the same than before… sounds strange!!!).
BTW, if you work with CUDA, stay with GL and you’ll preserve the maximum functionalities range and cross OS compatibility.