Memory corruption in rtBufferCreate

Hey

While using Application Verifier to debug what looked like heap corruption, I came across what appears to be a ‘use after free’ bug in the first call to rtBufferCreate or rtBufferCreateFromGLBO, possibly other rtBufferCreateFoo functions. It’s easily reproducible in the OptiX samples by enabling Application Verifier on one of the samples, I used optixPathTracer, and run the sample with a debugger attached.

The first exception I get is
Exception thrown at 0x00007FFEED576048 (KernelBase.dll) in optixPathTracer.exe:
0xC0000005: Acess violation reading location 0x00000244CF694FF0.

with call stack

optix.51.dll!00007ffe62020aac()
optix.51.dll!00007ffe61f3cf32()
sutil_sdk.dll!optix::ContextObj::createBufferFromGLBO(unsigned int type, unsigned int vbo) Line 2196

A total of 3 to 5 exceptions are thrown, but only when the first buffer is created, all subsequent calls work as epected and the initial buffer also contains the expected data, but once in a while OptiX will crash in rtContextLaunch2D.

In Application Verifier I’ve enabled all the tests under Basics and DirtyStacks in Miscellaneous.

Usual info
Windows 10, 1803
NVIDIA Quadro P2000
Driver 411.63
OptiX 5.1
CUDA 9.0
Visual Studio 2015 Update 3

I have a 1080GTX and a 2080 RTX that I plan to test it on as well.

Cheers
Asger Hoedt

Reproducible on a dekstop system with
Windows 10, 1803
GeForce 1080 GTX
Driver 398.36
OptiX 5.1
CUDA 9.1 (I know it’s not officially supported, but I just use it for looking for issues with 9.1)
Visual Studio 2015 Update 3

I have also run into this

System:
Ubuntu 18.04
Nvidia GTX 1080 TI
Drive 410
OptiX 5.1.1
CUDA 10.0
clang 6

I tried running with asan and got one message “Memory allocation failed” and a lot of “Invalid value”. Digging into it more, it seems like my call to rtContextCreate is failing with error 1282, “Memory allocation failed”

Without asan, I get the message malloc(): memory corruption

I really have no idea what might be causing this. Any help with debugging would be vastly appreciated

I’ve done a bit more digging and it seems like OptiX is allocating some fixed amount of memory for all its objects, and eventually it runs out of memory, which manifests itself as a failed call to rtBufferCreate. That’s the only explanation I have - the more minimal I make my code, the more butters I can create before one fails

Some more context, in case it’s helpful:

Here’s the basic flow of how I use OptiX:

  1. Create OptiX Context, enabling debugging through the print buffer
  2. Create an acceleration structure for the scene root and an RTgroup to represent the scene root. Set the group's child count to the number of objects in my scene (about nineteen for my current testing scene), then set its acceleration structure to the one I just created. Declare an RTvariable on the context for the scene root, and set the group I just creates as that variable's value
  3. Create buffers for the input and output data of the raytracing I need to do. I set their format and element size, but don't set their size - the size gets set when the context is launched. Create context variables for those buffers, and set the buffers as the variables' values
  4. Create an RTmaterial for my standard material. Load my raygen and closest hit programs. The raygen program is set on the context, while the closest hit program is set on the material I just created
  5. Load my bounds and intersection programs. I'm using the bounds and intersection programs from the OptiX samples. I save these to variables in my program so I can attach them to triangular meshes as they're loaded in
  6. Load my meshes into the OptiX context. Each mesh has its own RTgeometry, which has a vertex buffer and index buffer variable.
  7. Load my scene objects into the OptiX context. For each scene object I create both a texture and a geometry instance. The way everything is set up I don't yet have any kind of material system, but I need to test texture sampling performance. I call `rtBufferCreate` to create a 2D buffer for my texture... and this is where I get the error ``` malloc(): memory corruption ```

    After creating the texture, I create an RTgeometryinstance, RTgeometrygroup, and RTtransform for the scene object, and make the scene object a child of the scene root. I set the RTgeometrygroup’s acceleration structure to the acceleration structure of the RTgeometry that it uses and set the RTgeometryinstance’s material to the standard material I created earlier

I’ve done the usual debugging thing of “delete all possible code and add things in until it starts breaking”. When I don’t do step three - when I don’t create the input and output buffer - I’m able to load eighteen out of mineteen textures before getting the error

nunmap_chunk(): invalid pointer

. When I do perform step three, even if I don’t do anything else, I get the original error -

malloc(): memory corruption

It feels kinda like OptiX has a limited set of buffer memory, or a limited number of buffers, and I’m running into that limit. However, nothing I saw in the OptiX programming guide suggested that this was the case

My project is completely dead in the water unless I can get this resolved

Hi @david.dubois,

OptiX doesn’t have a limit on buffer memory aside from the available memory on your GPU. Number of buffers is only limited by the number of bits we use to identify buffers, many thousands of buffers is not uncommon.

Could you outline how much memory you’re expecting to use? How big are your geometries & textures?

Are you using multi-threading?

Are you setting your stack size?

Will you check the GPU free memory after creating 18 textures, and report back how much is available? You can query free memory (for example, rtContextGetAttribute(RT_CONTEXT_ATTRIBUTE_AVAILABLE_DEVICE_MEMORY) https://raytracing-docs.nvidia.com/optix/guide/index.html#host#3127) and/or use nvidia-smi to monitor memory usage.

Have you tried using rtContextValidate() and rtContextSetUsageReportCallback()? I assume you’re crashing before your first launch, so you might not be able to use those immediately, but do you think it’d be possible to do some empty launches while you’re building your scene graph in order to get validation and usage reporting?

Also, are you using Application Verifier like @papaboo, or are your errors coming directly from OptiX?


David.

Asger,

Are you experiencing errors only when using Application Verifier, or do you get the exceptions and/or crash either way?


David.

My textures are 1x1 pixels, because I wanted to eliminate high device memory usage as a source of this issue. My meshes have anywhere from 432 to 1,951,608 vertices - although most are a few hundred thousand vertices are less. I get the same error regardless of whether or not I upload any meshes, however

I am not using any multi-threading

I am not setting the stack size

Querying RT_CONTEXT_ATTRIBUTE_AVAILABLE_DEVICE_MEMORY shows that there’s 10,371,465,216 bytes of device memory available. Querying RT_CONTEXT_ATTRIBUTE_USED_HOST_MEMORY shows 34,012 bytes of host memory are currently used.

You’re correct that I’m crashing before my first launch.

When I call rtContextValidate before uploading each texture, I get the error message

munmap_chunk(): invalid pointer

on the second call to rtContextvalidate

When I set the usage report callback, I get a message “CUDA context memory (CUDA device: 0): 149.0 MBytes” after uploading the first texture. However, subsequent texture uploads don’t cause any output

I am not using Application Verifier. My errors are coming directly from OptiX. Looks like Application Verifier is a Windows tool, but I’m on Linux (I might be able to get a Windows computer to test on. but it’ll likely be a bit because my company’s IT department would have to set it up)

I did a little further testing and found that when I upload my meshes, but don’t create my raytracing input and output buffers, I get the error

malloc(): memory corruption

Hey David.

Its a half a year old report, so I don’t exactly remember which bug I was hunting in my own renderer, but you could reproduce it in the optixPathTracer sample. The sample works as expected without Application Verifier attached but with Application Verifier I get the exception. It may be something innocent or it may be something that’ll blow up the heap once in a million. :)

@David.dubois CUDA 10 isn’t supported by OptiX 5.1.1, perhaps downgrading to 9.0 will solve your issue. Or upgrade to OptiX 6.0.

/Asger

I can run the optixPathTracer sample without issue.

I tried installing CUDA 9, then I had to install clang-3.9, then it wasn’t finding the right standard library so I went back to CUDA 10 and clang 6, except that I had installed cuda-10.0, which apt interpreted as a regex so it installed a few packages… and now my program works as expected

Maybe I just had a bad CUDA install?

Yeah, maybe. Also possible that CUDA 10 is still a problem, but your install order or toolchain has changed enough to let things work for now.

I’m glad that you’re unblocked for the moment, but it’s unnerving to have the problem mysteriously go away without knowing why. I guess if random any crashes come back, getting the older clang & cuda is the first step. Or, if you can, get the newer OptiX & a newer driver as Asger mentioned. Good luck!


David.

It’s actually not that mysterious - my program has a few build targets for different kinds of tests, and I was unwittingly running a simpler test because CLion did’t auto-select the build target I thought it did

Back to the drawing board…

Re OptiX 6: I’m in a corporate environment where we have to use driver 410, and OptiX 6 requires driver 418 or better. I may or may not be able to get around that for testing this

I’ve switch to CUDA 9, and downgraded my version of clang to 3.9 so it’ll play nicely with nvcc… but I’m getting the same error about

malloc(): memory corruption

When I don’t upload any meshes I get a crash in my call to rtContextLaunch. When I do upload meshes, I get the malloc error when creating the first texture’s buffer

Meshes are uploaded before any textures

I’m assuming you’ve already triple-checked all your mem copies and types & sizes to make sure you’re not running off the end of a buffer. How small of a code sample does it take to reproduce, and can you send us a copy?


David.

Unfortunately I can’t send a minimal repro case - it seems like this bug only happens for me with my test scene, but I can’t share that for legal reasons. Additionally, priorities have shifted on my end and I’m focusing on other projects for the time being (which seems to be how most of my support tickets end :( )

I’m getting the same, since upgrading to Optix 6, and only on RTX 2060 cards (vs non RTX).

10 or so…

Exception thrown at 0x000007FEFD92BDFD (KernelBase.dll) in wave.exe: 0x000006C6: The array bounds are invalid.

Driver version 430.86