Can a GPU emulate another older (slower) one?

Back with another question. I’m just curious if there is any way to emulate an older CUDA capable GPU to make a comparison between runtimes? I’m not getting my hopes up and realise that comparing architectures is a hazy area, but I thought I’d check.

In my case I’m hoping to emulate a 8800GTX using a GTX480.

If code is compiled for a lower compute capability than the device it is being run on, does it still make use of increased memory sizes etc.? For example can a program compiled with sm_10 still allocate the full 48KB of shared memory on a sm_20 device?

Thank you for your help.

No. GTX 480 only accept sm_20 code.

You can take a look at the Barra simulator for G80.

Really? I’ve been compiling and running sm_10 code with no problems so far, and it is faster to compile too!

Thank you for that link, I will try to get it working.

That’s because you’re putting the ptx into the executable. The driver compiles it into sm_20 before executing the kernel.

That might explain why there is a noticable delay whenever the program is started. How do I prevent this from happening? Using arch_20, sm_20 does not seem to make a difference to this delay…

How do you compile your program? Are you on Linux? If that’s the case try this.