faster cudaMemcpy with 8800GTX than 260GTX?


I transfer two images (same size) to gpu. With cuda 1.1, 8800gtx I was observing 2x430 us. With cuda 2.1 and 260 gtx (same pc configuration, PCIE 1.1) I see 430+1370 us.
Do you see any reason for this?

With CUDA 2.1, latest drivers, and profiler 1.1, the memcopy from gpu to cpu is not shown by the profiler if destination memory is pinned. (otherwise, it appears)
Is this a known behaviour?

thanks, b.