Compile cublas library optimized to GPU architecture?

I recently registered as a developer to try out the beta 2.2. CUDA 2.2 has compiled and run my application just fine, thanks, although it is no faster than before.

The application I use is mainly based on the cublas library, which has got me to thinking whether compiling the cublas for my specific GPU (gtx 260) might make my application run faster (I’m merely obsessive-compulsive about tuning things up…). As a developer, I saw that the cublas source was available for download, together with the CUDA 2.2 beta.

Would I get a faster version of the cublas library if I were to compile it using optimizations (e.g., arch=sm_13) for the gtx 260? Or would this not lead to anything?

How does the single cublas library from the, e.g., CUDA 2.2 distribution package distinguish between the various GPU architectures, if at all? Does it need to?

CUBLAS supports double, which means it was already compiled with arch=sm_13.