strange results with simple kernel


I have written a simple kernel which just zeros out a block of memory (cudaMalloc2D) it is passed. On my desktop (Win7, x64, GTX285) it works fine, just as on another system with four Tesla C1060 cards. On a third system with a Quadro FX 5800 and three Tesla C1060 something strange happens: The Quadro FX does not execute the kernel at all and just blocks. The three Tesla cards execute the kernel, but the memory is not zeroed out, instead a bunch of random blocks of 4 or 8 sequentially bytes are set to random values and vertical stripes appear. I have installed the latest driver (released a couple of days), OS is Win Server 2008 R2. The system is accessed via Dameware MiniRC. Anyone got a clue what could go wrong with such a simple kernel? Why is the Quadro FX blocking and not executing the kernel at all (in fact I cannot even allocate memory on it)? I’m using CUDA 2.2 on all systems.

Thanks && kind regards