Minimum size of element in rtContextLaunch2D

imanolooo · July 4, 2014, 10:53am

Hello All,

I’m a new Optix User and I should reuse the code of a guy who is not possible to connect now.

I would like to know, if there is any minimum size of the window that is launched with rtContextLaunch2D, because the code is launching a window of Nx1 elements. If I launch with N = 100000 all works well, but if I launch with N = 10000 the function rtContextLaunch2D throws a std c++ exception… The buffers that I use to comunicate the context with c++ are of N size… So I can not understand this…

The current configuration is CUDA 4.1, Optix SDK 2.5.0, Windows 7 64 bits, Visual Studio 2010 with a 32 bits project, and GTX 460.

Thanks a lot!

Imanol.

droettger · July 4, 2014, 1:50pm

First thing to find out is what the exception reason was by decoding its error code.
Look for rtContextGetErrorString() in the OptiX SDK.

imanolooo · July 4, 2014, 4:06pm

Thanks for replying so fast!

The error string returned is:
“Unknown error (Details: Function “_rtContextLaunch2D” caught C++ standard exception: unknown error)”

I should mention that I launch the context in that form:

_context->launch( PROGRAM_PHOTON, static_cast<unsigned int>(_numPhotons), static_cast<unsigned int>(1) );

Thanks!

over0219 · July 5, 2014, 3:59pm

imanolooo:

Thanks for replying so fast!

The error string returned is:
“Unknown error (Details: Function “_rtContextLaunch2D” caught C++ standard exception: unknown error)”

I should mention that I launch the context in that form:
_context->launch( PROGRAM_PHOTON, static_cast<unsigned int>(_numPhotons), static_cast<unsigned int>(1) );
Thanks!

Although it shouldn’t be a problem, is there a reason for the 2D launch with all of the work in the x dimension? I can understand launching it in ( 1, N ) to distribute work to multiple GPUs, but ( N, 1 ) could simply be launched as 1D:

_context->launch( PROGRAM_PHOTON, static_cast<unsigned int>(_numPhotons) );

That won’t likely solve your problem, though.

But to answer your other question:

No, there is no minimum launch size.

But if I had to guess there is some assumption of a certain size within the kernels themselves. Without knowing much detail about what you’re doing, I can only recommend to revisit all buffer reads/writes.

imanolooo · July 8, 2014, 11:55am

over0219:

Although it shouldn’t be a problem, is there a reason for the 2D launch with all of the work in the x dimension? I can understand launching it in ( 1, N ) to distribute work to multiple GPUs, but ( N, 1 ) could simply be launched as 1D:
_context->launch( PROGRAM_PHOTON, static_cast<unsigned int>(_numPhotons) );
That won’t likely solve your problem, though.

The code is from another developer who is not currently available… I’m only implementing new functionalities on his base code. I don’t know why he did this… So I suppose I can use a 1D launch… But, for your comment should I understand that if I do a 2D launch in (1,N) I will reduce the computation time in comparation with launching 2D (N,1) and 1D?

I don’t understant what you mean with revisit all buffer reads/writes. Looking for an access out of memory or this kind of things?

Thanks!

nljones · July 8, 2014, 2:49pm

Not necessarily. Nvidia processors run a maximum of 32 threads per warp, but depending on the dimensions of your launch, some warps may run at less than maximum capacity. Also, depending on the amount of coherence in your work (that is, the extent to which all threads do the same thing at the same time), it may or may not be beneficial to use all 32 threads. See http://www.cs.berkeley.edu/~volkov/volkov10-GTC.pdf for details.

So to answer your first question, without testing each option ((N,1), (1,N), and 1D), you’ll never know which one is faster for your particular application.

imanolooo · July 8, 2014, 2:59pm

Mmmmmmm,

ok I understand!

I should do some tests then!

Thanks!!!

over0219 · July 8, 2014, 7:17pm

Yes.

That’s not really what I meant. I forget the exact number, but when you do a 2D launch as (N,1), the first ~65k threads or so will be launched on the first device. So, you’ll get very low utilization on the second device. Launching it as (1, N) will more evenly distribute the workload to both devices.

This only matters in a multi GPU environment though.

Topic		Replies	Views
3D OptixLaunch to accommodate multiple viewpoints OptiX	4	1108	October 12, 2021
rtContextLaunch1D: unknown error OptiX	9	1782	June 14, 2022
Maximum Launch Dimension in OptiX 6 OptiX	3	881	June 15, 2022
Launch size for best performances OptiX	11	953	June 14, 2022
Launch dimensions in LaunchContextnD and optixLaunch OptiX	5	1568	October 12, 2021
Issue with large 3D program launch size OptiX	3	403	June 14, 2022
rtContextLaunch2D failed (719) OptiX	7	2734	June 14, 2022
Fill output buffer from multiple threads OptiX	8	1384	October 12, 2021
OptiX crashing when launching pipeline with big data OptiX	5	955	June 14, 2022
Context validates but doesn't launch OptiX	3	767	June 14, 2022

Minimum size of element in rtContextLaunch2D

Related topics