Some problem with driver 260.xx

I have a application use a lot of atomic operations, from driver 260.89 the runtime about 25% slower as driver 258.96. My cards is GTX 285 on windows XP (64bit). Anyone has similar situations??

I also run simpleAtomicIntrinsics from CUDA SDK, 84ms with driver 258.96 and 93ms with 266.58.