Same Kernel different machine

I got a problem with a fluid dynamics application programmed in c++ and cuda.

It works on my new computer, with SDK 5.0 and a CUDA Capability Card of 2.1 (Nvidia GT610).

Unfortunately i also have to run it on a CUDA Capability Card of 1.1 card, a Nvidia Quadro FX 570. And it seems not to work on the old hardware. Does anybody have an idea, why the exacly (!) unchanged cuda kernels doesnt work any more on the older FX570??? SDK is also 5.0 on the old computer, VS MS 2010 is on both computers the IDE.

“Not working” means that the kernels seem not to be active, because my data has not changed. On each computer i compile to debug mode and 32bit. On the new computer, everythings works just fine and as expected.

Are you recompiling the code appropriately for each architecture? Does the code use any features that require compute capability > 1.1? Does the code have resource requirements that cannot be satisfied by the older GPU?

You state that the “kernel is not active”. Most likely this means the kernel couldn’t be launched. What is the error after the kernel launch? Does the code use error status checking on every CUDA API call and every kernel laucnh (at least optionally)? If not, you would want to add that; error status will allow you to narrow down why the code doesn’t work with the older GPU.

Thanks for the ideas! I`ll check them out tomorrow, its already after midnight in EU.

Yes, i do. I recompile on each machine within my IDE.

Thats hard to say, because im rather new to CUDA. Ive started learning CUDA in jan2013 and only with my GT610 CUDA Cap. 2.1 card. I`ve never used such an old hardware like CUDA Cap. 1.1 FX570 device before. Anyhow i cant flip the old card out of the computer, because im not allowed to. As features i use branching, loops and #pragma unroll of loops. Nothing uncommen. No shared memory. As far as i know, these features are supported by CUDA 1.1.

Not sure about that. That could it be. I`ve downloaded the Nsight Visual Studio 3.0 Debugger and will take an in detail look at the hardware useage as soon as possible.

No error comes up to in my output window.

No. Not yet.

Okay, ill add it asap. Ive found an approach on stackoverflow.com

http://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api

…to do it like this:

#define gpuErrchk(ans) { gpuAssert((ans), FILE, LINE); }
inline void gpuAssert(cudaError_t code, char *file, int line, bool abort=true)
{
if (code != cudaSuccess)
{
fprintf(stderr,“GPUassert: %s %s %d\n”, cudaGetErrorString(code), file, line);
if (abort) exit(code);
}
}

[…]
kernel<<<1,1>>>(a);
gpuErrchk( cudaPeekAtLastError() );
gpuErrchk( cudaDeviceSynchronize() );
[…]

The memcopies are already check by status call.

Thanks for the ideas again!