The issue with your problem descriptions is that only you know what you programmed. It’s simply not possible to analyze why the debug sutil DLL is not working in your new project without a reproducer.
If the CUDAOutputBuffer.h is all you’re using sutil for, then there shouldn’t be a need to link against its DLL for just that.
There isn’t necessarily a need to rewrite that CUDAOutputBuffer class either. It’s a header-only implementation which means you could simply copy that header (and the Exception.h it includes) into your project and use it directly if that removes the sutil library dependency.
The most important thing is to learn what CUDA runtime or driver API calls you need to do to setup the CUDA resources for OptiX inside your application. Means you should be able to implement this yourself in the end.
If you do that inside a class or directly inside your application doesn’t really matter for understanding the underlying concepts for the CUDA resource management required inside an OptiX application.
That output buffer is special in that it can be device-only memory, a pointer to a mapped OpenGL pixel buffer object on the device, pinned host memory, or even CUDA peer-to-peer access on another GPU connected with NVLINK.
It’s used in different OpenGL examples showing these things. For a first “Hello OptiX” program most of that flexibility isn’t needed.
It’s using GLAD for the CUDA-OpenGL interoperability which is one of the main points for that output buffer abstraction. Maybe you use Vulkan in the future to do the display part, then that wouldn’t be required either.
For other buffers you use inside OptiX programs that is usually not required and then you can use CUDA device buffers via their CUdeviceptr directly as well.
As an alternative example implementation, my OptiX 7 examples handle device or CUDA-OpenGL interop for the displayed output buffer directly.
It’s the only buffer needing that special handling.
If you search for m_systemParameter.outputBuffer inside this Application.cpp file of the first introductory example (using the CUDA runtime API), you see how that is either allocated as device buffer directly (cudaMalloc/cudaFree) which is then copied to host memory (m_outputBuffer) or set from an OpenGL pixel buffer object pointer when using CUDA-OpenGL interop (all code inside the m_interop cases).
It’s then finally uploaded to an OpenGL texture image from either the host memory or the PBO (faster).