CUDA performance degradation on GTX460 CUDA performance issues

I ran the Nvidia CUDA SDK sample “simpleTexture.exe” on GTX460 and GTX260 separately(single card per test in the same computer). The CUDA kernel on GTX460 is much slower than GTX260. Some other sdk sample programs did not like this case. It seems that such problem is related with texture. SDK sample “simpleTextureDrv.exe” is the same situation.

Any one knows the reason?

I copied the following results from “cuda_profile_0.log”:

GTX460 “cuda_profile_0.log”:

# CUDA_PROFILE_LOG_VERSION 2.0

# CUDA_DEVICE 0 GeForce GTX 460

# TIMESTAMPFACTOR 12019397ca38dc3f

method,gputime,cputime,occupancy

method=[ memcpyHtoA ] gputime=[ 177.088 ] cputime=[ 458.949 ] 

method=[ _Z15transformKernelPfiif ] gputime=[ 239.232 ] cputime=[ 15.826 ] occupancy=[ 0.333 ] 

method=[ _Z15transformKernelPfiif ] gputime=[ 239.520 ] cputime=[ 5.988 ] occupancy=[ 0.333 ] 

method=[ memcpyDtoH ] gputime=[ 177.152 ] cputime=[ 699.759 ]

GTX260 “cuda_profile_0.log”:

# CUDA_PROFILE_LOG_VERSION 2.0

# CUDA_DEVICE 0 GeForce GTX 260

# TIMESTAMPFACTOR 120199a1089c6b93

method,gputime,cputime,occupancy

method=[ memcpyHtoA ] gputime=[ 178.208 ] cputime=[ 466.225 ] 

method=[ _Z15transformKernelPfiif ] gputime=[ 117.024 ] cputime=[ 16.681 ] occupancy=[ 0.500 ] 

method=[ _Z15transformKernelPfiif ] gputime=[ 115.488 ] cputime=[ 5.560 ] occupancy=[ 0.500 ] 

method=[ memcpyDtoH ] gputime=[ 189.568 ] cputime=[ 1352.051 ]

Please notice the kernel gputime.

Driver version: 266.58

CUDA version: 3.1 & 3.2

OS : Windows 7 64bit Professional

Texturing performance on the GTX 460 is known to be sub-par. About half what is to be expected according to spec.

Wish NVIDIA would improve the texuring performance on GTX460.

Any one knows the root cause, compiler or hardware related?