Freeze on Sync after launch

I’ve been on this for a couple of days. Basically I used the cutouts example to build the basis for our little optix framework. In the same project the cutouts example worked just fine before.

However now I get a freeze after launch when it calls cudaDeviceSynchronize() (there is a plan to do this async later, but for now I am stuck here). I set the logCallback and the level to 4, pipeline debugLevel to FULL, as well as the module compile level and optimization to 0.

I tried a few things, like reducing the geometry, ray count etc. all to avail.

So my question is…
Is there anything else I am missing to figure this out or to get more information?
And does this maybe sound like a particular kind of problem?

Again, the strange thing is that it just worked in the same project with the cutouts example.

Hard to say anything concrete without a minimal reproducer. If it’s hanging and not crashing, then it’s important to figure out whether your threads are launching, whether they’re making progress, or whether you have bad data or an infinite loop somewhere, or whether maybe things got stuck before your raygen program due to perhaps some kind of bad launch configuration.

If you’re using OptiX 7.2, turn on validation mode. You could also enable exceptions and see if that turns up anything. Other than that, I don’t know if there are any tools that will easily reveal the issue.

When this kind of thing happens to me, I usually try some very simple and manual things, such as:

  • make the launch size very small. narrow to a single pixel that repros, if possible
  • print something at the top of raygen, does it appear?
  • comment the call to optixTrace(), does that resolve the hang?
  • if so, dive into closest hit and repeat - does a printf show up?
  • if not, then is the culprit in the SBT?
  • if so (printf works), then start commenting things and bisect until you’ve narrowed it down.
  • are there any loops? is it possible for the loops to get stuck if any assumptions are broken?
  • etc.

Sorry, I know you’ve already been doing some if not all of these things, there’s no magic bullet I know. If you exhaust all options and you can set up a minimal way to reproduce the issue, we can take a look. Ideally, if you can modify the SDK sample and cause it to happen, that’s by far the easiest way to demonstrate whether there’s a bug.


David.

Thanks a lot David!

The validation mode was exactly the kind of thing I was looking for. It doesn’t appear to produce any different output though.

I commented out most of the cuda code and get a successful launch now. Thanks for the good pointers, pretty sure I’ll figure out from here.

Again, thanks!

1 Like

Some other general OptiX device code tips and tricks.

  • Always align your fields inside structures to offsets which match their native CUDA alignment restrictions.
    I normally place them from bigger to smaller alignments. That avoids padding by the compiler (potentially smaller struct!) and inadvertent misaligned address errors.
    https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/nvlink_shared/shaders/per_ray_data.h#L86
  • Avoid the C++ bool type in structures or payloads.
    Use native CUDA data types instead or combine multiple booleans into an unsigned int as bitfield.
  • Make sure to always initialize all per ray payload values correctly.
    Always think about what values you expect when programs aren’t actually called. (You might be able to avoid miss programs when initializing the payloads for the miss case, e.g. shadow/visibility rays don’t need a miss program.)

@droettger Thanks for the tips!

Ok, I tried some more and it fails in __raygen_rg (in optixCutouts.cu) at

params.accum_buffer[ image_index ] = make_float4( accum_color, 1.0f);

with

2ERRORError recording event to prevent concurrent launches on the same OptixPipeline (CUDA error string: unspecified launch failure, CUDA error code: 719)
Error recording resource event on user stream (CUDA error string: unspecified launch failure, CUDA error code: 719)

And then when I destroy the context

OPTIX_ERROR_CUDA_ERROR: Optix call 'optixLaunch( state.debug_pipeline, state.stream, reinterpret_cast<CUdeviceptr>(state.d_params), sizeof(Params), &state.debug_sbt, state.params.width, state.params.height, 1 )' failed: e:\[...]debugoptixquery.cpp:340)


2ERRORError releasing namedConstant's internal resources (CUDA error string: unspecified launch failure, CUDA error code: 719)
Error synching on OptixPipeline event (CUDA error string: unspecified launch failure, CUDA error code: 719)
Error destroying OptixPipeline event (CUDA error string: unspecified launch failure, CUDA error code: 719)

Though I do get illegal memory access most of the times.

After some more testing, params appears to be set as in I can printf(params.width). However it crashes every time I try to change a params value (like an element of the accum buffer, also tested with make_float4(1.0f, 1.0f, 1.0f, 1.0f)

Still wrapping my head around c++ so this could be basic? I keep checking where I something went wrong “converting” the cutouts example, but can’t find anything.

Error recording event to prevent concurrent launches on the same OptixPipeline

Are you calling optixLaunch() with the same pipeline from different CPU threads?
That is not supported. See the yellow box in OptiX 7.2.0 Programming Guide Chapter 9.

But that is probably just the first chance to report the generic CUDA launch failure from the optixLaunch which failed.

I would be wary of ray generation programs which can launch arbitrary many numbers of rays like in the optixCutouts example. If that breaks with samples_per_launch = 1, then it’s something about the code you changed.

Again, it’s not possible to say what is going wrong. You changed a working example and it’s not working anymore. Nothing to do about that without knowing what you changed.

What is your system setup?
OS version, installed GPU(s), VRAM amount, display driver version, OptiX version, CUDA toolkit version, host compiler version?

Still wrapping my head around C++ so this could be basic? I keep checking where I something went wrong “converting” the cutouts example, but can’t find anything.

You could send me the complete source code inside an archive attached to a private message or send to OptiX-Help at nvidia.com (max. 10 MB and no *.zip attachments to e-mails, those get blocked, renamed *.zi_ or *.7z will do.)

Thanks for the offer to look into this. I keep trying for a bit more.
For stats: Windows 10 1909 (Build 18363.1139), a single 2080, 8gb VRAM, Optix 7.2, cuda 11.1, Driver is 457.30

2ERRORError recording event to prevent concurrent launches on the same OptixPipeline (CUDA error string: an illegal memory access was encountered, CUDA error code: 700)
Error recording resource event on user stream (CUDA error string: an illegal memory access was encountered, CUDA error code: 700)

This threw me off as well. I am wrapping this in a dll that I call from Unitys main thread. So right now I don’t have much control over that. Before I run this in a dedicated thread in Unity I tried to do the calls in the same call from Unity to make sure it’s in the same thread (it should be anyways) and I get the same result.

So multi threading shouldn’t be the issue here.

Well, this is kind of obvious and you have probably done that already, but if there is an invalid memory access inside the device kernel when just writing into an index of the accumulation buffer, I would check

  • if the CUdeviceptr inside the launch parameters is actually set to the correct value,
  • that the buffer has the correct size,
  • that the index into it is not accessing it out of bounds.

I am wrapping this in a dll that I call from Unitys main thread.

Do you have a standalone (non-Unity based) application which can load this DLL and do unit tests on the functionality?
If this is only happening in the Unity environment, then this wouldn’t be visible by looking at the code.

Again, thanks :)

So, in the end I threw everything from the cutouts sample and started anew with the optixTriangle. The idea being to simplify and then slowly increase complexity. I am sure it was some mixup with some of the pointers/references going somewhere they shouldn’t.

This is working now and I take it from here.