CUDA/Optix GPU Utilisation

Ok we’re going to do some testing on our Amazon P2 instance later and try to get some information with one of the gpu’s disabled if possible.

Are there any examples of RT_BUFFER_GPU_LOCAL usage ? And yeah we don’t use any float3 buffers.

The display of results isn’t factored in, we just perform tracing and encode the resulting buffer to a file after all samples have been taken.

Is there a way to explicitly set the stack size ?

We’re going to do some testing on an Amazon P3 instance which boasts 4x Tesla V100 GPU’s. Unfortunately I think that’s the only other GPU generation option available through Amazon EC2.

So thanks for all that info and we do have plans to switch to iterative tracing at some point, but what about the fact that we can only execute two blocks in parallel due to thread register usage ? It seems that even if we make those optimisation we’ll still be missing out on a lot of parallelism if I understand right (which I might not). Or will the stack size directly affect that ?