Why does the first launch of optix take so long?

dong.wang · February 27, 2019, 2:21am

Hi optix people，
What does the first launch do ? How can I lower its startup time?

nljones · February 27, 2019, 7:34am

The first launch compiles the PTX files that contain your device OptiX programs and saves them in a cache file on your disk. In subsequent launches, OptiX can use the cached program without having to recompile them, which is why launches after the first launch have much faster start-up time.

dong.wang · February 27, 2019, 8:09am

I have a question which I’ve rendered a scene , then replace another scene and launch again is still very slow. I use the same programs.

droettger · February 27, 2019, 10:20am

Please read the OptiX Programming Guide and API Reference. [url]http://raytracing-docs.nvidia.com/optix_6.0/index.html[/url]

The first launch compiles the programs and builds the accelerations structures when necessary.
That’s obviously happening when exchanging the whole scene.
[url]http://raytracing-docs.nvidia.com/optix_6.0/guide/index.html#host#acceleration-structure-builds[/url]

You can measure this compilation and acceleration structure build time by doing a dummy launch with zero dimensions.

The OptiX usage report functionality can give you some more information about what goes on.
[url]http://raytracing-docs.nvidia.com/optix_6.0/api/html/group___context.html#ga35c7da345b942238a7ec4db8e2516955[/url]
Example code here:
[url]https://github.com/nvpro-samples/optix_advanced_samples/blob/master/src/optixIntroduction/optixIntro_10/src/Application.cpp#L489[/url]

dong.wang · February 28, 2019, 5:39am

Thank you for your help!

this is my report.

[1][SYS INFO    ] 
OptiX Version:[5.1.0] Branch:[rel5.1] Build Number:[24109458] CUDA Version:[9.0] 64-bit 2018-05-09 
Display driver: 410.48
Devices available:
CUDA device: 0
    0000:01:00.0
    GeForce RTX 2080
    SM count: 46
    SM arch: 75
    SM clock: 1800 KHz
    GPU memory: 7951 MB
    TCC driver: 0
Devices selected:
CUDA device: 0
    0000:01:00.0
    GeForce RTX 2080
    SM count: 46
    SM arch: 75
    SM clock: 1800 KHz
    GPU memory: 7951 MB
    TCC driver: 0

[2][INFO        ] Program cache HIT  : rayGen
[2][INFO        ] Program cache HIT  : exception
[2][INFO        ] Program cache HIT  : miss
[2][INFO        ] Program cache HIT  : mesh_bounds
[2][INFO        ] Program cache HIT  : mesh_intersect
[2][INFO        ] Program cache HIT  : closesthit
[2][INFO        ] Program cache HIT  : anyHitShadow
[2][INFO        ] Program cache HIT  : transform
[2][INFO        ] Launch index 0.
[2][SCENE STAT  ]     Node graph object summary:
[2][SCENE STAT  ]         RTprogram         : 15
[2][SCENE STAT  ]         RTbuffer          : 19299
[2][SCENE STAT  ]         RTtexturesampler  : 708
[2][SCENE STAT  ]         RTacceleration    : 674
[2][SCENE STAT  ]         RTgroup           : 1
[2][SCENE STAT  ]         RTgeometrygroup   : 673
[2][SCENE STAT  ]         RTtransform       : 674
[2][SCENE STAT  ]         RTselector        : 0
[2][SCENE STAT  ]         RTgeometryinstance: 1383
[2][SCENE STAT  ]         RTgeometry        : 2760
[2][SCENE STAT  ]             Total prim: 624295
[2][SCENE STAT  ]         RTmaterial        : 118
[1][TIMING      ]     Time to first launch: 15699.6 ms
[2][MEM USAGE   ]     Buffer GPU memory usage:
[2][MEM USAGE   ]     |         Category |  Count |  Total MByte |
[2][MEM USAGE   ]     |           buffer |   9655 |        259.1 |
[2][MEM USAGE   ]     |          texture |    708 |         15.3 |
[2][MEM USAGE   ]     |      gfx interop |      0 |          0.0 |
[2][MEM USAGE   ]     |     cuda interop |      0 |          0.0 |
[2][MEM USAGE   ]     |   optix internal |    688 |         81.0 |
[2][MEM USAGE   ]     Buffer host memory usage: 84.7 Mbytes
[1][INFO        ]     Compilation triggered 
[1][TIMING      ]         Compilation time: 252.6 ms
[2][TIMING      ]     Acceleration update time: 1100.5 ms
[2][MEM USAGE   ]     Buffer GPU memory usage:
[2][MEM USAGE   ]     |         Category |  Count |  Total MByte |
[2][MEM USAGE   ]     |           buffer |   9655 |        259.1 |
[2][MEM USAGE   ]     |          texture |    708 |         15.3 |
[2][MEM USAGE   ]     |      gfx interop |      0 |          0.0 |
[2][MEM USAGE   ]     |     cuda interop |      0 |          0.0 |
[2][MEM USAGE   ]     |   optix internal |    688 |         81.0 |
[2][MEM USAGE   ]     Buffer host memory usage: 84.7 Mbytes
[1][INFO        ]     Compilation triggered 
[1][TIMING      ]         Compilation time: 2383.5 ms
[2][MEM USAGE   ]     Local memory for all threads (CUDA device: 0): 241.2 MBytes
[2][MEM USAGE   ]     Local memory for all threads (CUDA device: 0): 241.2 MBytes
[2][MEM USAGE   ]     Local memory for all threads (CUDA device: 0): 241.2 MBytes
[2][MEM USAGE   ]     Local memory for all threads (CUDA device: 0): 241.2 MBytes
[2][MEM USAGE   ]     Local memory for all threads (CUDA device: 0): 241.2 MBytes
[2][MEM USAGE   ]     Local memory for all threads (CUDA device: 0): 241.2 MBytes
[2][MEM USAGE   ]     Local memory for all threads (CUDA device: 0): 241.2 MBytes
[2][MEM USAGE   ]     Local memory for all threads (CUDA device: 0): 241.2 MBytes
[2][MEM USAGE   ]     Local memory for all threads (CUDA device: 0): 241.2 MBytes
[2][MEM USAGE   ]     Local memory for all threads (CUDA device: 0): 241.2 MBytes
[2][MEM USAGE   ]     Local memory for all threads (CUDA device: 0): 241.2 MBytes
[2][MEM USAGE   ]     Local memory for all threads (CUDA device: 0): 241.2 MBytes
[2][MEM USAGE   ]     Local memory for all threads (CUDA device: 0): 241.2 MBytes
[2][MEM USAGE   ]     Local memory for all threads (CUDA device: 0): 241.2 MBytes
[1][TIMING      ]     Total launch time: 22317.0 ms

why are there two Compilation time？
Output so much MEM USAGE, is my code wrong?
The rendering time is too long. Can I stop it at any time?

droettger · February 28, 2019, 9:13am

The first compilation is from the bounding box program which is used during acceleration structure building.
The second compilation is from the the rest of your programs which are required for the entry point you’ve launched with. That will happen once for every entry point or if you change any program.

You’re using a GeForce RTX 2080 with OptiX 5.1.0 on display driver 410.48.
That will not use any of the Turing hardware ray tracing features!

What you should be using is OptiX 6.0.0 on display driver 418.81 or newer. (On Linux 418.30 or newer.)
That also compiles faster and caches the results on disk so that the startup time of consecutive runs of the application is much faster.

Read all OptiX 6.0.0 related posts on this forum to learn how to make the best use of it. Search for “GeometryTriangles” and “RTX execution strategy”.
The search field is in the top right corner of this page. Click on “Show” in the results page to limit the search to the OptiX sub-forum.

Why do you have 2760 Geometry nodes but only 1383 GeometryInstances?
There can only be one Geometry per GeometryInstance.

259 MB in buffers plus 81 MB internal is not really much.
You created 19299 buffers, and acceleration structures are not really small. Again, use OptiX 6.0.0.

Generally I would recommend to check if you can reduce the number of scene graph nodes overall. There are too many buffers in that scene for my taste. If that is using the OptiX SDK examples loading OBJ files that would explain it. Please have a look into the OptiX Introduction examples I linked to before. I optimized their renderer architecture to minimize the number of scene graph nodes.

“The rendering time is too long. Can I stop it at any time?”

Benchmark questions require absolute performance numbers.
Try with OptiX 6.0.0 and GeometryTriangles.

If things take too long to renderer, you could do progressive algorithms, like partitioning of the rendering in additive parts, like per light, or tile based rendering, or accumulating the final frame with Monte Carlo algorithms in path tracers etc.

xhnsworks · March 7, 2019, 1:51am

Is means that using lesser Geometry nodes and more GeometryInstances will make performance improve？

droettger · March 7, 2019, 10:56am

No, what I said is that the OptiX usage report lists more RTgeometry objects (2760) than RTgeometryinstance objects (1383), which is uncommon because those two objects have a one-to-one relationship because one RTgeometryinstance can only hold one RTgeometry (or RTgeometrytriangles).
[url]http://raytracing-docs.nvidia.com/optix_6.0/api/html/group___geometry_instance.html#ga5a346e03309cff580d2889786e465966[/url]

Means there are 2760 - 1383 = 1397 Geometry objects created inside your application which can’t be part of the scene graph reachable from the one root Group and are therefore useless when you’re concerned about memory consumption, unless they are created because your application is exchanging Geometry nodes repeatedly.

Topic		Replies	Views
OptiX Time for Launch OptiX	9	1329	June 14, 2022
Optix-low computational usage on GPU OptiX	12	932	June 22, 2022
Wraps in Ray gen and how data is initially stored in the memory hierarchy OptiX	13	1020	June 14, 2022
OPTIX, acceleration structure requires too much space OptiX	10	2615	June 15, 2022
Optix 6.5 Demo Performance Concern OptiX hw , cuda	6	1541	October 12, 2021
Optix 7.5 memory access problem OptiX	24	2077	August 11, 2023
Some questions about ray OptiX	10	1766	May 12, 2023
Fill output buffer from multiple threads OptiX	8	1387	October 12, 2021
GPU program optimization questions OptiX	4	1084	December 2, 2021
optixLaunch configuration revisited OptiX	8	1762	June 14, 2022

Why does the first launch of optix take so long?

Related topics