Hi,
Yes, the test is repeatable: I get the same numbers every time I run the program, so it must be de-allocating everything when program quits, hence at least the driver keeps the correct count. There still might be a memory leak in OptiX, that wouldn’t show in this kind of test though, wouldn’t it?
What I also tried to do is to manually allocate some CUDA memory, after OptiX context is created, then de-allocate it before OptiX context destruction, and it seems that all memory that I allocate manually get freed correctly.
I then went on and placed more allocation/de-allocation checks, and it seems that OptiX allocates quite some memory for the programs, that is not de-allocated afterwards (or at least the garbage collection in the driver is slow).
I have the following createContext function from the optixMeshViewer example:
void createContext()
{
printVramInfo("just before OptiX context is created", true);
context = Context::create();
printVramInfo("just after Context::create() call");
context->setRayTypeCount( 2 );
context->setEntryPointCount( 1 );
context["radiance_ray_type"]->setUint( 0u );
context["shadow_ray_type" ]->setUint( 1u );
context["scene_epsilon" ]->setFloat( 1.e-4f );
std::cout << "width: " << width << " height: " << height << std::endl;
Buffer buffer = sutil::createOutputBuffer( context, RT_FORMAT_UNSIGNED_BYTE4, width, height, use_pbo );
context["output_buffer"]->set( buffer );
// Ray generation program
std::string ptx_path( ptxPath( "pinhole_camera.cu" ) );
Program ray_gen_program = context->createProgramFromPTXFile( ptx_path, "pinhole_camera" );
context->setRayGenerationProgram( 0, ray_gen_program );
// Exception program
Program exception_program = context->createProgramFromPTXFile( ptx_path, "exception" );
context->setExceptionProgram( 0, exception_program );
context["bad_color"]->setFloat( 1.0f, 0.0f, 1.0f );
// Miss program
ptx_path = ptxPath( "constantbg.cu" );
context->setMissProgram( 0, context->createProgramFromPTXFile( ptx_path, "miss" ) );
context["bg_color"]->setFloat( 0.34f, 0.55f, 0.85f );
printVramInfo("just after OptiX context is initialized");
}
All the programs are standard from SDK. The memory usage is then as follows:
just before OptiX context is created: free VRAM: 5324912230 total VRAM: 6442450944
just after Context::create() call: free VRAM: 5324912230, memory delta: 0 bytes
width: 1024 height: 768
just after OptiX context is initialized: free VRAM: 5046777446, memory delta: 278134784 bytes
So it allocates some 265.25 MB of VRAM already here (this happens regardless of whether I load a mesh afterwards or not).
Then, during mesh loading, the amount of VRAM doesn’t change, so I guess, OptiX preallocates and manages some of the VRAM by itself.
before mesh loading: free VRAM: 5046777446, memory delta: 0 bytes
after mesh loading: free VRAM: 5046777446, memory delta: 0 bytes
before context validation: free VRAM: 5046777446, memory delta: 0 bytes
after context validation: free VRAM: 5046777446, memory delta: 0 bytes
Then first 2 context launches consume some more VRAM:
just before first context launch: free VRAM: 5046777446, memory delta: 0 bytes
just after first context launch: free VRAM: 5018072678, memory delta: 28704768 bytes
just after 2 context launches: free VRAM: 5011125862, memory delta: 6946816 bytes
just after 3 context launches: free VRAM: 5011125862, memory delta: 0 bytes
just after 4 context launches: free VRAM: 5011125862, memory delta: 0 bytes
...
and then it stabilizes.
I allocate some CUDA RAM manually then:
after contex launch 10:
before 134217728 bytes allocating: free VRAM: 5011125862, memory delta: 0 bytes
after 134217728 bytes allocating: free VRAM: 4876908134, memory delta: 134217728 bytes
after contex launch 20:
before 134217728 bytes allocating: free VRAM: 4876908134, memory delta: 0 bytes
after 134217728 bytes allocating: free VRAM: 4742690406, memory delta: 134217728 bytes
after contex launch 30:
before 134217728 bytes allocating: free VRAM: 4742690406, memory delta: 0 bytes
after 134217728 bytes allocating: free VRAM: 4608472678, memory delta: 134217728 bytes
after contex launch 40:
before 134217728 bytes allocating: free VRAM: 4608472678, memory delta: 0 bytes
after 134217728 bytes allocating: free VRAM: 4474254950, memory delta: 134217728 bytes
after contex launch 50:
before 134217728 bytes allocating: free VRAM: 4474254950, memory delta: 0 bytes
after 134217728 bytes allocating: free VRAM: 4340037222, memory delta: 134217728 bytes
after contex launch 60:
before 134217728 bytes allocating: free VRAM: 4340037222, memory delta: 0 bytes
after 134217728 bytes allocating: free VRAM: 4205819494, memory delta: 134217728 bytes
after contex launch 70:
before 134217728 bytes allocating: free VRAM: 4205819494, memory delta: 0 bytes
after 134217728 bytes allocating: free VRAM: 4071601766, memory delta: 134217728 bytes
and finally I do a cleanup before OptiX context destruction:
before CUDA memory deallocated: free VRAM: 4071601766, memory delta: 0 bytes
Total manually allocated 939524096
freeing all manually allocated VRAM...
before context is destroyed: free VRAM: 5011125862, memory delta: -939524096 bytes
after context is destroyed: free VRAM: 5035636326, memory delta: -24510464 bytes
Total VRAM balance: 289275904 bytes
This shows, that I allocated 939524096 byte manually, then all of it was properly released, then OptiX released 24510464 more bytes, while still keeping 289275904 to itself.
I would be grateful for any kind of explanation of this behavior and any further tests that I could do to prove that OptiX is handling memory correctly/wrongly, e.g. how can one be sure that garbage collector is doing its job and it is not an OptiX issue?