GeForce80 Performance with Cuda

My system:
Intel Core2 2.4G
GeForce8400GS
Xp sp
Cuda2.1
vs2008
When I use the cuda SDK bin to test my system, There is some question:
With the bandwidthTest.exe, It’s wrong when malloc 32M memory on the gpu, but 8M works
With clock.exe(block 64, thread 256), it cost about 135778 clocks ,much slower than the specification about 9981 clocks
Anyone can explain my doubts, thanks