OptiX 5 setStackSize appears to multiply by 5

I’m using OptiX 5 with CUDA 9.1 in Visual Studio 2015, running Windows 10 with a Quadro K1100M and driver 391.03.
If I call setStackSize, then getStackSize, the returned stack size is 5 times the argument to setStackSize (or sometimes a bit more, presumably due to alignment rules). This did not occur with my previous configuration (OptiX 3.9.1, CUDA 7.5, VS2013). Is this a bug, or am I misunderstanding the new usage somehow?
Thanks

Yes, that’s an unfortunate change from OptiX 3 to 4.
See explanations here: https://devtalk.nvidia.com/default/topic/1010533

Thanks for the information. Is the 5x multiplier consistent for OptiX 5.0 executables across supported GPU architectures?
While troubleshooting intermittent out-of-memory issues, I’ve been logging getAvailableDeviceMemory() results at various points in my code, and I’m seeing some odd results. If nvidia-smi reports ~1GB of free device memory, getAvailableDeviceMemory after context creation typically reports ~1.7GB free device memory. A particular test case sometimes runs with ~900MB free, and sometimes fails with an out-of-memory error. Sometimes it reports ~100MB free after the first OptiX launch, then ~900MB after subsequent launches. I’ve even seen some runs where OptiX reports 0 bytes free device memory, then my application allocates a few more buffers and runs without issues. I know that other processes impact the device memory available, but during these tests I’m not doing anything that should generate 900MB of variability.
How is getAvailableDeviceMemory different than nvidia-smi’s device memory usage? Is it trying to account for device memory which could become available by swapping device-resident buffers to system memory? Any ideas why I’m seeing such large variation in memory availability, and/or how I could better manage it (besides using less memory in general or dropping support for smaller GPUs)?
Thanks

“Is the 5x multiplier consistent for OptiX 5.0 executables across supported GPU architectures?”

Yes, which makes it even more important to figure out the minimum stack size at this time, because GPUs with more cores will need more memory.

I cannot tell what would result in the differences in available memory in your case. That is a state which is constantly in flux, also due to the underlying OS. For OptiX esp. the acceleration structure building process itself will take quite some amount of memory temporarily during build and depending on when you’re reading the free memory amount it can vary drastically.

I would expect that nvidia-smi and the rtDeviceGetAttribute() function in OptiX use the same underlying CUDA interface to read the device information.

I’m sometimes using a small nvidia-smi command to dump device information in a command prompt while running applications. Looks like this and prints the information per installed device every 500 ms.

"C:\Program Files\NVIDIA Corporation\NVSMI\nvidia-smi.exe" --format=csv,noheader --query-gpu=timestamp,name,pstate,memory.total,memory.used,utilization.memory,utilization.gpu --loop-ms=500

You could also add the rtContextSetUsageReportCallback() function added in OptiX 5.0.0 to dump some human readable information about what OptiX did internally.