Very poor multi-GPU scaling on DGX-1

sam331 · November 8, 2018, 1:11pm

Hi,

we have tested Optix on all of the 8 Tesla V100 GPUs in a DGX-1 machine and to our surprise noticed very poor scaling.

On 1 GPU, we saw an occupancy of about 50% and a framerate of 80 fps.
On 8 GPUs, the occupancy was about 20% per GPU and the framerate was only 180 fps.

Are there any settings, flags which need to be enabled to get better scaling?

droettger · November 8, 2018, 1:54pm

When using multiple devices in OptiX the output and input_output buffers reside in pinned memory and there is congestion when writing over the PCI-E bus to the same target with many GPUs.

If your renderer is accumulating images, that expensive read-modify-write operation can be done in GPU local buffers and only the final result can be written to an output buffer which then resides in pinned memory. That should increase the multi-GPU scaling drastically.

Find some more information when digging through all links in this and the referenced threads:
[url]https://devtalk.nvidia.com/default/topic/1036340/?comment=5264830[/url]

SuperGastrocnemius · January 11, 2019, 1:08pm

Hi, we are currently thinking of purchasing a Nvidia VCA machine with 8 Quadro P6000 GPUs.

How well does Optix scale in the VCA? And are there any benchmarks of Optix running on such a machine?

droettger · January 11, 2019, 2:21pm

You should consider using Turing boards for GPU ray tracing nowadays.

The Pascal architecture is two generations older and Turing contains dedicated ray tracing cores and tensor cores, so both pure ray tracing performance and AI denoising is a lot faster. There are also newer rasterization features available.
[url]https://devblogs.nvidia.com/nvidia-turing-architecture-in-depth/[/url]

Then there could be this option: [url]https://www.nvidia.com/en-us/design-visualization/quadro-servers/[/url]

SuperGastrocnemius · January 25, 2019, 3:38pm

We’re looking into running Optix on a VCA to drive a very large displaywall (8m x 3m) illuminated by eight 4K projectors.

From our tests on a 4 GPU system, we think Optix’ load balancing feature may hurt the scaling performance across mutliple GPUs. Since we want to render directly to the projectors, without copying the buffer to host memory first, is it possible to manually turn off load balancing in Optix?

Also, can Optix render on a distributed cluster of nodes (each outfitted with multiple GPUs)?

Topic		Replies	Views
Multi GPU OptiX	7	3131	June 14, 2022
Optix performance on Nvidia Quadro RTX Server OptiX	3	600	June 14, 2022
CUDA/Optix GPU Utilisation OptiX	5	2688	June 14, 2022
GPU usage is not 100% \|\| Performance question OptiX	6	2890	June 14, 2022
Optix-low computational usage on GPU OptiX	12	939	June 22, 2022
Optix 6.5 Demo Performance Concern OptiX hw , cuda	6	1542	October 12, 2021
Recommendations for splitting work between GPUs OptiX	4	2351	February 12, 2024
How can I force my OptiX program to run on the GPU to improve performance? OptiX	6	854	June 14, 2022
Handling of very large meshes OptiX	3	956	October 26, 2022
How can i set size for distributed render? OptiX	7	672	June 14, 2022

Very poor multi-GPU scaling on DGX-1

Related topics