System unresponsive when running a kernel


I managed to get CUDA working on my GF8800M GT. CUDA now provides it’s power through a object file linked into a python c extension, callable from python. Oh, the joy.

However, while the program is running, cpu consumption locks at 99% (one core) with almost all time spend in kernel mode. Xorg (running with nvidia 177.13 driver, Composite disabled) becomes very unresponsive and takes ages to redraw simple windows. I noticed this is not the case for all the SDK examples so it seems the fault is somewhere on my side.

The calculation-cycle is simply taking the arguments from the python interpreter, allocating memory on the device, copying over a input structure, executing the kernel, copying back the results to host memory and return formatted result objects to python. I already tried to limit the input buffer fed to the kernel at once but performance suffers a lot due to overhead becomming visible; also the situation of refresh rates in X11 doesn’t change by that :-\

Anyone with some suggestions on that?

How long is each kernel call? The GPU is shared between and CUDA, and the display can only be updated between kernel calls. The SDK examples probably have very short kernels, which is why the display does not appear to lag.