launch time out with Optix Prime 3.7 beta 3

z00 · February 25, 2015, 3:47pm

Hello,

I’m trying to use Optix Prime 3.7 beta 3 to replace my own OpenCL/CUDA ray tracer.
Unfortunatly, when i launch my app, display driver times out or my computer crash with a blue screen (win 8.1).

The scene is small with 926120 triangles and 468827 vertices. I create 2 queries of 1048576 rays (ORIGIN_TMIN_DIR_TMAX) and so 1048576 hits (D_TRIID_U_V).

configuration : NVIDIA GeForce GTX TITAN, driver 347.52, Intel(R) Xeon(R) CPU E5620 @ 2.40GHz

Are there limitations on number of queries, query size or model size ? Where can i find specifications about these limitations ?

I traced exec with OPTIX_API_CAPTURE but calls seem to be well ordered.

4
64
Platform: Windows
Capture time: 2015-02-25 15:57
%%
rtpContextCreate( 257, 000000DBF11F84F0 )
  res = 0
  hdl = 000000DBF11F8670
rtpContextSetCudaDeviceNumbers( 000000DBF11F8670, 1, 000000DBAAC810C0 )
  val = 0
  res = 0
rtpBufferDescCreate( 000000DBF11F8670, 1025, 513, 0000000904740000, 000000DBAAC810C0 )
  res = 0
  hdl = 000000DBF11F8E20
rtpBufferDescSetRange( 000000DBF11F8E20, 0, 926120 )
  res = 0
rtpBufferDescCreate( 000000DBF11F8670, 1056, 513, 0000000905580000, 000000DBAAC9F1C0 )
  res = 0
  hdl = 000000DBF11F8990
rtpBufferDescSetRange( 000000DBF11F8990, 0, 468827 )
  res = 0
rtpBufferDescSetStride( 000000DBF11F8990, 32 )
  res = 0
rtpModelCreate( 000000DBF11F8670, 000000DBAAC94250 )
  res = 0
  hdl = 000000DBF11F8A10
rtpModelSetTriangles( 000000DBF11F8A10, 000000DBF11F8E20, 000000DBF11F8990 )
  file::prime::0000000904740000 = oac.prime.000000.potx // indices
  file::prime::0000000905580000 = oac.prime.000001.potx // vertices
  res = 0
rtpModelUpdate( 000000DBF11F8A10, 8193 )
  res = 0
rtpBufferDescCreate( 000000DBF11F8670, 1089, 513, 00000009148C0000, 000000DBAABD3150 )
  res = 0
  hdl = 000000DBF078C9D0
rtpBufferDescSetRange( 000000DBF078C9D0, 0, 1048576 )
  res = 0
rtpBufferDescCreate( 000000DBF11F8670, 1123, 513, 00000009168C0000, 000000DBAABC7B50 )
  res = 0
  hdl = 000000DBF078CA50
rtpBufferDescSetRange( 000000DBF078CA50, 0, 1048576 )
  res = 0
rtpModelFinish( 000000DBF11F8A10 )
  res = 0
rtpQueryCreate( 000000DBF11F8A10, 4097, 000000DBAABCB350 )
  res = 0
  hdl = 000000DBF078CAD0
rtpQuerySetRays( 000000DBF078CAD0, 000000DBF078C9D0 )
  file::prime::00000009148C0000 = oac.prime.000002.potx // rays_api
  res = 0
rtpQuerySetHits( 000000DBF078CAD0, 000000DBF078CA50 )
  file::prime::00000009168C0000 = oac.prime.000003.potx // hits_api
  res = 0
rtpQueryExecute( 000000DBF078CAD0, 16385 )
  res = 0
rtpBufferDescCreate( 000000DBF11F8670, 1089, 513, 000000091A0C0000, 000000DBAA9E1870 )
  res = 0
  hdl = 000000DBAC3CFB70
rtpBufferDescSetRange( 000000DBAC3CFB70, 0, 1048576 )
  res = 0
rtpBufferDescCreate( 000000DBF11F8670, 1123, 513, 000000091C0C0000, 000000DBAA9B27C0 )
  res = 0
  hdl = 000000DBAC3CFBF0
rtpBufferDescSetRange( 000000DBAC3CFBF0, 0, 1048576 )
  res = 0
rtpModelFinish( 000000DBF11F8A10 )
  res = 0
rtpQueryCreate( 000000DBF11F8A10, 4096, 000000DBAA95ADE0 )
  res = 0
  hdl = 000000DBAC2BB2D0
rtpQuerySetRays( 000000DBAC2BB2D0, 000000DBAC3CFB70 )
  file::prime::000000091A0C0000 = oac.prime.000004.potx // rays_api
  res = 0
rtpQuerySetHits( 000000DBAC2BB2D0, 000000DBAC3CFBF0 )
  file::prime::000000091C0C0000 = oac.prime.000005.potx // hits_api
  res = 0
rtpQueryExecute( 000000DBAC2BB2D0, 16385 )
  res = 0
rtpQuerySetCudaStream( 000000DBF078CAD0, 000000DBB0840D30 )
  res = 0
rtpBufferDescSetRange( 000000DBF078C9D0, 0, 1048576 )
  res = 0
rtpBufferDescSetRange( 000000DBF078CA50, 0, 1048576 )
  res = 0
rtpQueryExecute( 000000DBF078CAD0, 16385 )
  res = 0

Thanks.

Heiko · February 26, 2015, 6:47pm

Hey z00,

just to get the ball rolling with your app you could make an experiment and try to make your query exec calls synchronous (either by removing the async flag or by calling rtpQueryFinish after every execute call. Does that make your app work?

z00 · February 27, 2015, 1:37pm

Hello Heiko,

I tried to remove async calls from Optix prime and CUDA removing async flags and streams. i had the same issue. My app is multi-threaded also but documentation says that is supported.

Should i call all rtp functions in the same thread ?

I changed Windows hardware time out in register editor so rtpQueryExecute completed in more than 20 seconds just for 1 million rays and execution is very slow. It looks like memory swap in GPU process explorer.
I tried also to change vertices and triangles format removing strides but it didn’t change the result.
My app needs a lot of device memory.

Is there a prerequisit on available memory before initialization of OptiX ?
Should i create OptiX context before my CUDA buffers (like for OpenGL interop) ?

Thanks.

Heiko · February 27, 2015, 5:15pm

When I understand it right, you try to use a single Prime context with multiple queries, and each query is executed in a separate thread, right?

I believe this is not supported at the moment. And yes, I believe you should call all rtp functions in the same thread (when you only use a single device, meaning a single prime context). When you want use multiple devices (manually managed), then I guess you should create for every device a separate Prime context (and bufferdescs and model, and query) and set the device number of the context with rtpContextSetCudaDeviceNumbers. In that case you should be able to use multiple threads (one fixed thread for every prime context using a single GPU).

So, to summarize I believe this should work:
One Prime context with multiple async queries in the same thread
Multiple Prime contexts, with multiple async (or just one) queries but every context is handled in its own thread (and the selected devices per context should be mutually exclusive)

This is not supported (I think):
One Prime context, with multiple async queries, and each query handled in a different thread

There should not be a prerequisite on available (device) memory before initializing a context.
The order you use for creating OptiX contexts and CUDA buffers should not matter.

z00 · March 10, 2015, 10:37am

I initialize a Prime context in one thread for all devices (a loop). I execute 2 queries per device in others separated threads. So i tried to put all Prime calls in the same thread but i had the same issue.

I tried to play with number of rays in queries. I obtained these results :

Device GeForce GTX TITAN.
NVidia OptiX create model...0.058000 s
NVidia OptiX create first query...2.679000 s
thread 0 number of pixels : 1048576
thread 0 number of rays in first query : 23831
NVidia OptiX create second query...0.000000 s
thread 0 number of rays in second query : 23831
thread 0 allocated memory : 76670756

//////////////////////////////////////////////

Device GeForce GTX TITAN.
NVidia OptiX create model...0.057000 s
NVidia OptiX create first query...0.364000 s
thread 0 number of pixels : 1048576
thread 0 number of rays in first query : 47662
NVidia OptiX create second query...0.001000 s
thread 0 number of rays in second query : 47662
thread 0 allocated memory : 80102420

/////////////////////////////////////////////

Device GeForce GTX TITAN.
NVidia OptiX create model...0.055000 s
NVidia OptiX create first query...2.479000 s
thread 0 number of pixels : 1048576
thread 0 number of rays in first query : 95325
NVidia OptiX create second query...0.002000 s
thread 0 number of rays  in second query : 95325
thread 0 allocated memory : 86965892

///////////////////////////////////////////

Device GeForce GTX TITAN.
NVidia OptiX create model...0.059000 s
NVidia OptiX create first query...0.003000 s
thread 0 number of pixels : 1048576
thread 0 number of rays in first query : 190650
NVidia OptiX create second query...0.000000 s
thread 0 number of rays  in second query : 190650
thread 0 allocated memory : 100692692

///////////////////////////////////////////

Device GeForce GTX TITAN.
NVidia OptiX create model...0.057000 s
NVidia OptiX create first query...0.004000 s
thread 0 number of pixels : 1048576
thread 0 number of rays in first query : 381300
NVidia OptiX create second query...0.002000 s
thread 0 number of rays  in second query : 381300
thread 0 allocated memory : 128146292

///////////////////////////////////////////

Device GeForce GTX TITAN.
NVidia OptiX create model...0.061000 s
NVidia OptiX create first query...0.005000 s
thread 0 number of pixels : 1048576
thread 0 number of rays in first query : 762600
NVidia OptiX create second query...6.486000 s
thread 0 number of rays  in second query : 762600
thread 0 allocated memory : 183053492

///////////////////////////////////////////

It is very strange that number of rays in queries give random creation time. I thought it was a synchronization issue but, with one thread for OptiX calls, it is the result.

Topic		Replies	Views
Optix Prime error in cudaMemcpyAsync when ray buffer exceeds 2MB OptiX	8	2225	June 14, 2022
OptiX Prime Program Not Working OptiX	15	3687	June 14, 2022
Performance of OptiX prime OptiX	7	2894	June 14, 2022
OptiX, OptiX Prime, Compatibility with CPU and RTX OptiX	23	6258	June 14, 2022
Some questions about ray OptiX	10	1792	May 12, 2023
RTX triangles performance, any tips? OptiX	9	1928	June 14, 2022
What performance to expect from OptixPrime? OptiX	5	901	June 14, 2022
OptiX Prime: disable automatic use of multiple GPUs? OptiX	5	721	June 14, 2022
Optix-low computational usage on GPU OptiX	12	944	June 22, 2022
Insufficient device memory. GPU does not support paging OptiX	15	5175	June 15, 2022

launch time out with Optix Prime 3.7 beta 3

Related topics