I’m running 347.88 on Win7/x64 with CUDA 7.0 (final) and Nsight 4.6.
I’m seeing terrible spillage and a resulting 7x reduction in performance on dozens of kernels in an unchanged codebase. I’m only focusing on sm_50 right now.
I was running CUDA 7.0.18 RC and Nsight 4.5.x. It worked fine.