Hi,
I’ve been testing Optix 5.1.1 for 3D visualization for a few days. My code is based on the optixMeshViewer example, where I have replaced the existing OBJ loader with a proprietary loader. I am using the example’s PTX files and I provide all the buffers that the PTX code expects. My geometry is a classic triangle soup, and I provide the vertices and the indexes, since I can see that the code computes a geometric normal on the fly if the normal buffer is empty, I don’t provide the normal buffer.
I have access to a machine running Linux with 4 Tesla V100 each with 32GB of memory, connected trough NVLink and I tried to test the memory usage on that machine. Since it’s running Linux the driver should automatically run in TCC mode.
If I use the RT_BUFFER_INPUT flag to create buffers, with a geometry of 145323936 triangles and “Bvh” acceleration structures, the memory occupation is:
-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.72 Driver Version: 410.72 CUDA Version: 10.0 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2… Off | 00000000:61:00.0 Off | 0 |
| N/A 38C P0 65W / 300W | 17366MiB / 32480MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 1 Tesla V100-SXM2… Off | 00000000:62:00.0 Off | 0 |
| N/A 40C P0 68W / 300W | 14038MiB / 32480MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 2 Tesla V100-SXM2… Off | 00000000:89:00.0 Off | 0 |
| N/A 39C P0 63W / 300W | 14038MiB / 32480MiB | 11% Default |
±------------------------------±---------------------±---------------------+
| 3 Tesla V100-SXM2… Off | 00000000:8A:00.0 Off | 0 |
| N/A 40C P0 69W / 300W | 14038MiB / 32480MiB | 0% Default |
±------------------------------±---------------------±---------------------+
While if I use RT_BUFFER_INPUT_OUTPUT I get:
±----------------------------------------------------------------------------+
| NVIDIA-SMI 410.72 Driver Version: 410.72 CUDA Version: 10.0 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2… Off | 00000000:61:00.0 Off | 0 |
| N/A 37C P0 73W / 300W | 14042MiB / 32480MiB | 99% Default |
±------------------------------±---------------------±---------------------+
| 1 Tesla V100-SXM2… Off | 00000000:62:00.0 Off | 0 |
| N/A 38C P0 76W / 300W | 10714MiB / 32480MiB | 72% Default |
±------------------------------±---------------------±---------------------+
| 2 Tesla V100-SXM2… Off | 00000000:89:00.0 Off | 0 |
| N/A 38C P0 70W / 300W | 10714MiB / 32480MiB | 66% Default |
±------------------------------±---------------------±---------------------+
| 3 Tesla V100-SXM2… Off | 00000000:8A:00.0 Off | 0 |
| N/A 38C P0 75W / 300W | 10714MiB / 32480MiB | 76% Default |
±------------------------------±---------------------±---------------------+
If I use only 1 GPU I get with RT_BUFFER_INPUT_OUTPUT:
±----------------------------------------------------------------------------+
| NVIDIA-SMI 410.72 Driver Version: 410.72 CUDA Version: 10.0 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2… Off | 00000000:61:00.0 Off | 0 |
| N/A 31C P0 41W / 300W | 11MiB / 32480MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 1 Tesla V100-SXM2… Off | 00000000:62:00.0 Off | 0 |
| N/A 32C P0 43W / 300W | 11MiB / 32480MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 2 Tesla V100-SXM2… Off | 00000000:89:00.0 Off | 0 |
| N/A 35C P0 54W / 300W | 17370MiB / 32480MiB | 14% Default |
±------------------------------±---------------------±---------------------+
| 3 Tesla V100-SXM2… Off | 00000000:8A:00.0 Off | 0 |
| N/A 33C P0 45W / 300W | 11MiB / 32480MiB | 0% Default |
±------------------------------±---------------------±---------------------+
Is this the expected behavior? The limit that I can push from 1 GPU to 4 GPUs is only about 3 GBytes. Am I missing something?