I was doing more testing running the latest version, after manually adjusting each of my 3 GPU’s to 33% utilization, for optimum workload balance.
I began forcing different Work Group sizes using a bat file, to see how it would affect my systems performance.
Example of the .bat file used to force Work Group size 224.
smallptGPU.exe 1 1 224 1024 768 scenes\cornell.scn
Size 8 -> 2,721k Samples/sec
Size 16 -> 2,967k Samples/sec
Size 32 -> 3,054k Samples/sec
Size 64 -> 4,915k Samples/sec
Size 96 -> 3,602k Samples/sec
Size 128 -> 4,451k Samples/sec
Size 160 -> 4,915k Samples/sec
Size 192 -> 5,041k Samples/sec
Size 224 -> 3,978k Samples/sec
Size 256 -> 4,321k Samples/sec
Size 320 -> 4,802k Samples/sec
Size 384 -> 5,173k Samples/sec
Size 448 -> Would not run
Size 512 -> Would not run
Size 576 -> Would not run
Just running the default ‘smallptGPU’ file, I get 5,173k Samples/sec
I did noticed when I do run the default ‘smallptGPU’ file, it says the ‘Suggested work group size: 384’ in the DOS window.
That is also is the work group size, that I get my best performance on…
I tried to increase the work group size further, but the program would not run.
Question:[/b] Is it a known fact that Nvidia can’t allocate a larger work group size than 384? Just wondering…
If so, what is the limiting factor? GPU memory?
Second question: If we could further increase the Work Group size past 384, do you think we might see some additional performance?