Should OptiX context destruction lead to deallocation of used VRAM?

Hi,

I noticed that the amount of VRAM available before/after OptiX context creation and destruction is not the same. I’m using Optix 4.1 on Windows 10, compiled with VS 2015. Drivers 384.76, CUDA 8.0.61, my GPU is 980ti.

I tried the optixMeshViewer sample from OptiX SDK, where I use cudaMemGetInfo before Optix context is created, before it is destroyed and after it is destroyed. I use an .obj model with ~60k vertices, normals and faces and I get the following numbers:

before context is created free VRAM: 5324912230 total VRAM: 6442450944
before context is destroyed free VRAM: 5011125862 total VRAM: 6442450944
after context is destroyed free VRAM: 5035636326 total VRAM: 6442450944

275 MB of VRAM got lost in the process…

In my own application, where I allocate much more buffers with Optix, I get the same behavior, but the “leak” is much larger (that is why I tried to do the same with an OptiX sample above in the first place). My app shows these numbers:

Before Optix initialization: free VRAM: 5073585766 total VRAM: 6442450944
Before Optix is deallocated: free VRAM: 3999958630 total VRAM: 6442450944
After Optix is deallocated: free VRAM: 4157507174 total VRAM: 6442450944

(There might be still some memory used by regular OpenGL in my application, but it can’t be 870 MB).

So my question is, if it is a normal behavior/ known issue of OptiX and/or am I doing something wrong?

Yes, optix should essentially call cudaFree on every buffer created with the given context. One question here is whether the CUDA driver is being lazy about cleaning up the allocations until it wants to do so. Do you see the memory go down as soon as you exit your application completely?

Hi,
Yes, the test is repeatable: I get the same numbers every time I run the program, so it must be de-allocating everything when program quits, hence at least the driver keeps the correct count. There still might be a memory leak in OptiX, that wouldn’t show in this kind of test though, wouldn’t it?

What I also tried to do is to manually allocate some CUDA memory, after OptiX context is created, then de-allocate it before OptiX context destruction, and it seems that all memory that I allocate manually get freed correctly.

I then went on and placed more allocation/de-allocation checks, and it seems that OptiX allocates quite some memory for the programs, that is not de-allocated afterwards (or at least the garbage collection in the driver is slow).

I have the following createContext function from the optixMeshViewer example:

void createContext()
{
    printVramInfo("just before OptiX context is created", true);
    context = Context::create();
    printVramInfo("just after Context::create() call");

    context->setRayTypeCount( 2 );
    context->setEntryPointCount( 1 );

    context["radiance_ray_type"]->setUint( 0u );
    context["shadow_ray_type"  ]->setUint( 1u );
    context["scene_epsilon"    ]->setFloat( 1.e-4f );

    std::cout << "width: " << width << " height: " << height << std::endl;
    Buffer buffer = sutil::createOutputBuffer( context, RT_FORMAT_UNSIGNED_BYTE4, width, height, use_pbo );
    context["output_buffer"]->set( buffer );

    // Ray generation program
    std::string ptx_path( ptxPath( "pinhole_camera.cu" ) );
    Program ray_gen_program = context->createProgramFromPTXFile( ptx_path, "pinhole_camera" );
    context->setRayGenerationProgram( 0, ray_gen_program );

    // Exception program
    Program exception_program = context->createProgramFromPTXFile( ptx_path, "exception" );
    context->setExceptionProgram( 0, exception_program );
    context["bad_color"]->setFloat( 1.0f, 0.0f, 1.0f );

    // Miss program
    ptx_path = ptxPath( "constantbg.cu" );
    context->setMissProgram( 0, context->createProgramFromPTXFile( ptx_path, "miss" ) );
    context["bg_color"]->setFloat( 0.34f, 0.55f, 0.85f );

    printVramInfo("just after OptiX context is initialized");
}

All the programs are standard from SDK. The memory usage is then as follows:

just before OptiX context is created: free VRAM: 5324912230 total VRAM: 6442450944
just after Context::create() call: free VRAM: 5324912230, memory delta: 0 bytes
width: 1024 height: 768
just after OptiX context is initialized: free VRAM: 5046777446, memory delta: 278134784 bytes

So it allocates some 265.25 MB of VRAM already here (this happens regardless of whether I load a mesh afterwards or not).

Then, during mesh loading, the amount of VRAM doesn’t change, so I guess, OptiX preallocates and manages some of the VRAM by itself.

before mesh loading: free VRAM: 5046777446, memory delta: 0 bytes
after  mesh loading: free VRAM: 5046777446, memory delta: 0 bytes
before context validation: free VRAM: 5046777446, memory delta: 0 bytes
after  context validation: free VRAM: 5046777446, memory delta: 0 bytes

Then first 2 context launches consume some more VRAM:

just before first context launch: free VRAM: 5046777446, memory delta: 0 bytes
just after  first context launch: free VRAM: 5018072678, memory delta: 28704768 bytes
just after 2 context launches: free VRAM: 5011125862, memory delta: 6946816 bytes
just after 3 context launches: free VRAM: 5011125862, memory delta: 0 bytes
just after 4 context launches: free VRAM: 5011125862, memory delta: 0 bytes
...

and then it stabilizes.

I allocate some CUDA RAM manually then:

after contex launch 10:
before 134217728 bytes allocating: free VRAM: 5011125862, memory delta: 0 bytes
after  134217728 bytes allocating: free VRAM: 4876908134, memory delta: 134217728 bytes
after contex launch 20:
before 134217728 bytes allocating: free VRAM: 4876908134, memory delta: 0 bytes
after  134217728 bytes allocating: free VRAM: 4742690406, memory delta: 134217728 bytes
after contex launch 30:
before 134217728 bytes allocating: free VRAM: 4742690406, memory delta: 0 bytes
after  134217728 bytes allocating: free VRAM: 4608472678, memory delta: 134217728 bytes
after contex launch 40:
before 134217728 bytes allocating: free VRAM: 4608472678, memory delta: 0 bytes
after  134217728 bytes allocating: free VRAM: 4474254950, memory delta: 134217728 bytes
after contex launch 50:
before 134217728 bytes allocating: free VRAM: 4474254950, memory delta: 0 bytes
after  134217728 bytes allocating: free VRAM: 4340037222, memory delta: 134217728 bytes
after contex launch 60:
before 134217728 bytes allocating: free VRAM: 4340037222, memory delta: 0 bytes
after  134217728 bytes allocating: free VRAM: 4205819494, memory delta: 134217728 bytes
after contex launch 70:
before 134217728 bytes allocating: free VRAM: 4205819494, memory delta: 0 bytes
after  134217728 bytes allocating: free VRAM: 4071601766, memory delta: 134217728 bytes

and finally I do a cleanup before OptiX context destruction:

before CUDA memory deallocated: free VRAM: 4071601766, memory delta: 0 bytes
Total manually allocated 939524096
freeing all manually allocated VRAM...
before context is destroyed: free VRAM: 5011125862, memory delta: -939524096 bytes
after  context is destroyed: free VRAM: 5035636326, memory delta: -24510464 bytes
Total VRAM balance: 289275904 bytes

This shows, that I allocated 939524096 byte manually, then all of it was properly released, then OptiX released 24510464 more bytes, while still keeping 289275904 to itself.

I would be grateful for any kind of explanation of this behavior and any further tests that I could do to prove that OptiX is handling memory correctly/wrongly, e.g. how can one be sure that garbage collector is doing its job and it is not an OptiX issue?