Correct placement of cudaDeviceReset() for large C++ application

Is atexit() the preferred method as mentioned in this thread?

It is a large C++ application with many classes which contain both host and device pointers.

This atexit() approach does work correctly in the the large application, and any other location I put a cudaDeviceReset() seems to reset the device before the class destructors free the device memory.

Originally the application did not have a call to cudaDeviceReset(), and since I have to justify the addition of the atexit() call (been using it for profiling) need to make sure this is the correct approach.

Another alternative is to use cudaDeviceSynchronize()+cudaProfilerStop() to halt profiling.

I usually use cudaDeviceReset() but it sounds like you have a tricky shutdown situation.