problems with CUDA problems with CUDA on device with sm_11

My kernel CUDA program walk with one cycle with fixed iterations (or in other case - with array).

If I increase cuda kernel runs with (x4) multiplier that my cuda program have the same performance.

But if I increase (x4) passes this cycle ( or array ) in each kernel that is slowly in 1,5 times.

[ cuda kernel runs 310000 (if x4 that is 1240000) ]

Why is it ?

[ cuda device with sm_11 ]

I have a CUDA program that if decrease cuda kernel runs with (x10) multiplier - no give ANY MORE performance.