I managed to get CUDA working on my GF8800M GT. CUDA now provides it’s power through a object file linked into a python c extension, callable from python. Oh, the joy.
However, while the program is running, cpu consumption locks at 99% (one core) with almost all time spend in kernel mode. Xorg (running with nvidia 177.13 driver, Composite disabled) becomes very unresponsive and takes ages to redraw simple windows. I noticed this is not the case for all the SDK examples so it seems the fault is somewhere on my side.
The calculation-cycle is simply taking the arguments from the python interpreter, allocating memory on the device, copying over a input structure, executing the kernel, copying back the results to host memory and return formatted result objects to python. I already tried to limit the input buffer fed to the kernel at once but performance suffers a lot due to overhead becomming visible; also the situation of refresh rates in X11 doesn’t change by that :-\
Anyone with some suggestions on that?