Performance dependence on kernel resolution

Hi, experts:
I recently do a research, based on an idea that ray tracer’s performance is proportional to the work load. Thus reducing the tracing ray’s number could result in performance’s improvement.
However, I do some tests in isgReflection example and set doISG as false to remove the influence of Image Space Gathering. The following table shows the ray tracer’s performance depending on the kernel resolution:
ray-tracing time
10241024 7.32ms
512 3.46ms
256256 2.31ms
128 2.04ms
6464 1.93ms
32 1.85ms

It’s calculated by averaging 1000 frames. It doesn’t scale well and even in small kernel, raytracer pays not small price.

I know optix may use some ray coherence things, but I still want to know is there some way to “make” it linear?

My system is windows 7, visual studio 2012, gtx 980m. cuda 7 and optix 3.8.
And optix 3.9 gets the similar results.

Easy, you’re not generating enough load on the GPU to keep all streaming multiprocessors busy with these tiny launch sizes and there is a constant software overhead per launch.
Go the other way and use bigger sizes and results should be more in line with your expectations.

Have a search for “NVIDIA CUDA grids blocks threads explained” to see how much work the GPU can handle in parallel.