Complier Optimization

In the MS2005, we can change the complier (nvcc) option. Is that right the complier can optimize differently according to different GPU architecture? You know, we can change the GPU architecture to sm_10, sm_11, sm_12…Therefore, my question is whether the complier can optimize automatically according different GPU (that different GPU perhaps means that have different resources).

My second question is that if my program is optimized under 8800GT, whether is also best under GTX280, you know the resource is different, if not, whether have some approach to achieve best performance according the different GPU by setting manually or in program.

Sadly as far as i know it doesn’t happen automatically. What it dose is open up different instruction sets you can use. so you can have code written for different compute capabilities and with if defs get it optimized by your self. look at the particles demo, i think they do that sort of thing over there with atomic operations. There are some architectural differences between the g200 and g80, so you will probably get better results after tweaking and fine tuning you app for the g200. For example in the g80 the maximum concurrent threads on a processor are 756 on the g200 its 1024. the g200 has double the register space. coalescing is much less significant on the g200 since the hardware can fix uncoaleced reads to a certain extent.

Thanks very much…