I’m performing some tests and collecting results with the NVidia Compute Visual Profiler.
I inizialize my NDrange global variabile with values x = 159 y = 159 z = 1 tot = 25281
but I don’t explicitly set the local_work_size,
thus leaving the choice to set the size to the Opencl runtime.
Using Compute visual Profiler the results are quite different, in fact:
Does anyone have an explanation for such strange behavour?
How does the runtime choose the allocation strategy?
Thanks in advance for any help