I have several Maths kernels written for a GTS250 that do not run much faster on my new GTX480.
I had a look at the nvcc compiler manual and it is not clear whether one needs to set compiler flags for the new architecture and what these flags should be. There are several architecture and code options but they do not make much sense to me. sm_10 ? sm_20 ???
I am using VC2008 on Vista as a development environment.
What does one need to invoke the GTX480 architecture for a new build.
Could someone please direct me to some meaningful description regarding building for the Fermi architecture, clear compiler options?
First of all, I suggest you to use the custom build rule for *.cu files that Nvidia provides with SDK, you can specify almost anything in it. With this custom build rule you can select three architectures for which the code will be generated. When running on a particular hardware code that matches this hardware better than others will be selected for execution. sm_10, sm_11, sm_12 - hardware prior to GT200; sm_13 - GT200; sm_20 - Fermi. If you already have sm_20 selected then your kernel will be launched on GTX480 at it’s best.
Can Fermi execute pre-sm_20 code at all, I suppose not? I think it takes the embedded .ptx code and recompiles for your architecture already at run-time.
That doesn’t mean that specific compiler flags can’t help, though…