CUDA - multiple devices Using multiple gpus

From documentation

The use of multiple GPUs as CUDA devices by an application running on a multi-
GPU system is only guaranteed to work if these GPUs are of the same type.

I have GTX295 and 9800GTX and my code is running on all these cards. Just starting 3 threads and in each thread kernel is started on respective gpu.

What exactly does mean sentence from documentation?

Thanks.

I believe It means that different hardware has different capabilities, and there is no guarantee that any piece of code with arbitrary execution parameters will run on any piece of CUDA capable hardware without modification. For example, if you wrote code containing double precision arithmetic and compiled for compute 1.3 devices, and ran it on your system , it would fail on your 9800GTX, because it doesn’t support double precision. Similarly, you could write code including global memory atomic operations and try running it on a system with a C870 and a C1060, and it will also fail on the C870, because compute 1.0 devices don’t support global memory atomics. I also don’t know how smart the cuda driver is, and whether it is capable of JIT compiling PTX to multiple targets during one kernel load.

So in the particular case you have, it works, but it isn’t guaranteed to.

Thanks, but sentence from documentation is quite generic so that if not all gpus are the same it is not guaranteed to work (there are not mentioned cuda capabilities).

You conclusion is implicit something like code tested for one particular gpu (cuda x.y) may not work on other system with different gpu (cuda < x.y).