Why does the first launch of optix take so long?

Hi optix people,
What does the first launch do ? How can I lower its startup time?

The first launch compiles the PTX files that contain your device OptiX programs and saves them in a cache file on your disk. In subsequent launches, OptiX can use the cached program without having to recompile them, which is why launches after the first launch have much faster start-up time.

I have a question which I’ve rendered a scene , then replace another scene and launch again is still very slow. I use the same programs.

Please read the OptiX Programming Guide and API Reference. http://raytracing-docs.nvidia.com/optix_6.0/index.html

The first launch compiles the programs and builds the accelerations structures when necessary.
That’s obviously happening when exchanging the whole scene.
http://raytracing-docs.nvidia.com/optix_6.0/guide/index.html#host#acceleration-structure-builds

You can measure this compilation and acceleration structure build time by doing a dummy launch with zero dimensions.

The OptiX usage report functionality can give you some more information about what goes on.
http://raytracing-docs.nvidia.com/optix_6.0/api/html/group___context.html#ga35c7da345b942238a7ec4db8e2516955
Example code here:
https://github.com/nvpro-samples/optix_advanced_samples/blob/master/src/optixIntroduction/optixIntro_10/src/Application.cpp#L489

Thank you for your help!

this is my report.

[1][SYS INFO    ] 
OptiX Version:[5.1.0] Branch:[rel5.1] Build Number:[24109458] CUDA Version:[9.0] 64-bit 2018-05-09 
Display driver: 410.48
Devices available:
CUDA device: 0
    0000:01:00.0
    GeForce RTX 2080
    SM count: 46
    SM arch: 75
    SM clock: 1800 KHz
    GPU memory: 7951 MB
    TCC driver: 0
Devices selected:
CUDA device: 0
    0000:01:00.0
    GeForce RTX 2080
    SM count: 46
    SM arch: 75
    SM clock: 1800 KHz
    GPU memory: 7951 MB
    TCC driver: 0

[2][INFO        ] Program cache HIT  : rayGen
[2][INFO        ] Program cache HIT  : exception
[2][INFO        ] Program cache HIT  : miss
[2][INFO        ] Program cache HIT  : mesh_bounds
[2][INFO        ] Program cache HIT  : mesh_intersect
[2][INFO        ] Program cache HIT  : closesthit
[2][INFO        ] Program cache HIT  : anyHitShadow
[2][INFO        ] Program cache HIT  : transform
[2][INFO        ] Launch index 0.
[2][SCENE STAT  ]     Node graph object summary:
[2][SCENE STAT  ]         RTprogram         : 15
[2][SCENE STAT  ]         RTbuffer          : 19299
[2][SCENE STAT  ]         RTtexturesampler  : 708
[2][SCENE STAT  ]         RTacceleration    : 674
[2][SCENE STAT  ]         RTgroup           : 1
[2][SCENE STAT  ]         RTgeometrygroup   : 673
[2][SCENE STAT  ]         RTtransform       : 674
[2][SCENE STAT  ]         RTselector        : 0
[2][SCENE STAT  ]         RTgeometryinstance: 1383
[2][SCENE STAT  ]         RTgeometry        : 2760
[2][SCENE STAT  ]             Total prim: 624295
[2][SCENE STAT  ]         RTmaterial        : 118
[1][TIMING      ]     Time to first launch: 15699.6 ms
[2][MEM USAGE   ]     Buffer GPU memory usage:
[2][MEM USAGE   ]     |         Category |  Count |  Total MByte |
[2][MEM USAGE   ]     |           buffer |   9655 |        259.1 |
[2][MEM USAGE   ]     |          texture |    708 |         15.3 |
[2][MEM USAGE   ]     |      gfx interop |      0 |          0.0 |
[2][MEM USAGE   ]     |     cuda interop |      0 |          0.0 |
[2][MEM USAGE   ]     |   optix internal |    688 |         81.0 |
[2][MEM USAGE   ]     Buffer host memory usage: 84.7 Mbytes
[1][INFO        ]     Compilation triggered 
[1][TIMING      ]         Compilation time: 252.6 ms
[2][TIMING      ]     Acceleration update time: 1100.5 ms
[2][MEM USAGE   ]     Buffer GPU memory usage:
[2][MEM USAGE   ]     |         Category |  Count |  Total MByte |
[2][MEM USAGE   ]     |           buffer |   9655 |        259.1 |
[2][MEM USAGE   ]     |          texture |    708 |         15.3 |
[2][MEM USAGE   ]     |      gfx interop |      0 |          0.0 |
[2][MEM USAGE   ]     |     cuda interop |      0 |          0.0 |
[2][MEM USAGE   ]     |   optix internal |    688 |         81.0 |
[2][MEM USAGE   ]     Buffer host memory usage: 84.7 Mbytes
[1][INFO        ]     Compilation triggered 
[1][TIMING      ]         Compilation time: 2383.5 ms
[2][MEM USAGE   ]     Local memory for all threads (CUDA device: 0): 241.2 MBytes
[2][MEM USAGE   ]     Local memory for all threads (CUDA device: 0): 241.2 MBytes
[2][MEM USAGE   ]     Local memory for all threads (CUDA device: 0): 241.2 MBytes
[2][MEM USAGE   ]     Local memory for all threads (CUDA device: 0): 241.2 MBytes
[2][MEM USAGE   ]     Local memory for all threads (CUDA device: 0): 241.2 MBytes
[2][MEM USAGE   ]     Local memory for all threads (CUDA device: 0): 241.2 MBytes
[2][MEM USAGE   ]     Local memory for all threads (CUDA device: 0): 241.2 MBytes
[2][MEM USAGE   ]     Local memory for all threads (CUDA device: 0): 241.2 MBytes
[2][MEM USAGE   ]     Local memory for all threads (CUDA device: 0): 241.2 MBytes
[2][MEM USAGE   ]     Local memory for all threads (CUDA device: 0): 241.2 MBytes
[2][MEM USAGE   ]     Local memory for all threads (CUDA device: 0): 241.2 MBytes
[2][MEM USAGE   ]     Local memory for all threads (CUDA device: 0): 241.2 MBytes
[2][MEM USAGE   ]     Local memory for all threads (CUDA device: 0): 241.2 MBytes
[2][MEM USAGE   ]     Local memory for all threads (CUDA device: 0): 241.2 MBytes
[1][TIMING      ]     Total launch time: 22317.0 ms

why are there two Compilation time?
Output so much MEM USAGE, is my code wrong?
The rendering time is too long. Can I stop it at any time?

The first compilation is from the bounding box program which is used during acceleration structure building.
The second compilation is from the the rest of your programs which are required for the entry point you’ve launched with. That will happen once for every entry point or if you change any program.

You’re using a GeForce RTX 2080 with OptiX 5.1.0 on display driver 410.48.
That will not use any of the Turing hardware ray tracing features!

What you should be using is OptiX 6.0.0 on display driver 418.81 or newer. (On Linux 418.30 or newer.)
That also compiles faster and caches the results on disk so that the startup time of consecutive runs of the application is much faster.

Read all OptiX 6.0.0 related posts on this forum to learn how to make the best use of it. Search for “GeometryTriangles” and “RTX execution strategy”.
The search field is in the top right corner of this page. Click on “Show” in the results page to limit the search to the OptiX sub-forum.

Why do you have 2760 Geometry nodes but only 1383 GeometryInstances?
There can only be one Geometry per GeometryInstance.

259 MB in buffers plus 81 MB internal is not really much.
You created 19299 buffers, and acceleration structures are not really small. Again, use OptiX 6.0.0.

Generally I would recommend to check if you can reduce the number of scene graph nodes overall. There are too many buffers in that scene for my taste. If that is using the OptiX SDK examples loading OBJ files that would explain it. Please have a look into the OptiX Introduction examples I linked to before. I optimized their renderer architecture to minimize the number of scene graph nodes.

“The rendering time is too long. Can I stop it at any time?”

Benchmark questions require absolute performance numbers.
Try with OptiX 6.0.0 and GeometryTriangles.

If things take too long to renderer, you could do progressive algorithms, like partitioning of the rendering in additive parts, like per light, or tile based rendering, or accumulating the final frame with Monte Carlo algorithms in path tracers etc.

Is means that using lesser Geometry nodes and more GeometryInstances will make performance improve?

No, what I said is that the OptiX usage report lists more RTgeometry objects (2760) than RTgeometryinstance objects (1383), which is uncommon because those two objects have a one-to-one relationship because one RTgeometryinstance can only hold one RTgeometry (or RTgeometrytriangles).
http://raytracing-docs.nvidia.com/optix_6.0/api/html/group___geometry_instance.html#ga5a346e03309cff580d2889786e465966

Means there are 2760 - 1383 = 1397 Geometry objects created inside your application which can’t be part of the scene graph reachable from the one root Group and are therefore useless when you’re concerned about memory consumption, unless they are created because your application is exchanging Geometry nodes repeatedly.