The overall GPU load in the TaskManager might not show the compute workload by default.
Please select the TaskManager’s Performance tab and click on the GPU icon to show the individual engines and change one of the graphs to CUDA.
That should run at over 90% load when running the optixPathTracer. The 3D load inside the other graph there is the OpenGL texture blit of the rendered image.
That the OptiX 6.5.0 optixPathTracer example is only running around 60 fps in its default windows size and much slower in full screen is normal. That is implementing an iterative global illumination path tracer which shoots multiple paths per pixel at once. It’s expensive.
The final display in the OptiX SDK examples is normally done with an OpenGL texture blit to the back buffer and a swap buffers. By default that swap is synchronized to the monitor refresh rate, so giving a maximum of 60 fps if your monitor is running with 60 Hz.
For benchmarking it’s recommended to disable the vertical sync inside the NVIDIA Control Panel.
Right-click on the desktop, select NVIDIA Control Panel, go to 3D Settings → Manage 3D Settings → Settings → Vertical Sync and change the value to Off.
Then run the simple OptiX examples like the optixMeshViewer again. In a small window that should run well in the hundreds of frames per second. (On my Quadro RTX 6000 that runs at >650 fps at default window size.)
The memory usage on the device is dependent on what CUDA allocates alone for the CUDA context and then all OptiX buffers, textures, acceleration structures, shader programs, etc.
Performance analyses with Nsight need to happen with the standalone programs Nsight Systems for the overall application behaviour and Nsight Compute for the individual CUDA device kernels you programmed in OptiX. (Nsight Graphics won’t help with CUDA compute workloads.)
Use the latest Nsight versions and display drivers and make sure your PTX code was compiled with --generate-lineinfo (-lineinfo) to be able to match the CUDA source code to the PTX and SASS assembly.
When starting new projects with OptiX, I would recommend to use OptiX 7.0.0 which has a completely different host API which is a lot more modern and generally faster. Everything around the actual OptiX calls is handled in native CUDA Runtime or Driver API calls which gives you much better control and flexibility.
OptiX 7 based examples implementing rather fast and flexible unidirectional path tracers can be found here.
https://forums.developer.nvidia.com/t/optix-advanced-samples-on-github/48410
They should generally be more interactive than the OptiX 6.5.0 optixPathTracer example.