Hello, everyone! I have a problem with performance.
Kernel launch cfg:
blocks = dim3(VIEWPORT_W / 16, VIEWPORT_H / 16);
threads = dim3(16, 16);
When VIEWPORT_W and VIEWPORT_H is decreasing, average performance is significantly decreasing too.
Performance measurment in 10^6 rays per second.
Here is my tests:
768x768 - 4.6 mrays/sec
512x512 - 3.0 mrays/sec
256x256 - 1.1 mrays/sec
128x128 - 0.6 mrays/sec
Environment - CUDA 4.0, sm_10, GTS 450 (192 cores).
I admit that some decresing may be due to launch overhead and caching, but it is too much.
Same code working on CPU and it is always perform 0.02 mrays/sec, independently of resolution.
(thank you and sorry for english)