[1] I have already got GTX 485 and what I noticed is that on GTX 485 functions from CUBLAS have about 10x worse performance than on GTX 285 - any idea how come ? [these are scaling vectors, dot products used qute biq vectors - 10 000 and 100 000]
I have installed : driver 197.75, Cuda 3.0,

[2] By the way, where can I find some sample codes for Fermi - for instance how to useand configure L1,L2 memories or execute kernels ansynchronously ?

e.g. L1/L2 configurations on the page 157