I have a large model whose weights are in a separate onnx.data file. The onnx file itself loads properly but when I try to run the profiling it says that it cannot find the external data file although it is in the same folder as the onnx file. It is looking in a temporary folder that gets rapidly deleted, so I cannot just move it there to make it work. Is there some additional configuration I need to do for this kind of models?
Hi,
Thank you for reporting this issue. From the screenshot it looks like ONNX Runtime is looking at the correct location for the external data “C:\Users\joseperezcano\AppData\Local\Temp\NVIDIA Nsight Deep Learning Designer-hWlmIH\waft_dav2s4_i2_dyn512_1024_2048.onnx.data”, which is the temporary folder DL Designer creates to perform profiling. However, it looks like either the “waft_dav2s4_i2_dyn512_1024_2048.onnx.data” file was not created successfully there or the file is corrupted.
To investigate further we would request that you put the attached “nvlog.config.txt” file next to your locally installed nsight-dl.exe and rename it to “nvlog.config” (Remove .txt)— on Windows that would be something like “C:\Program Files\NVIDIA Corporation\Nsight DL Designer XXX\host\windows-desktop-dl-x64”. This will enable DL Designer to collect logs into a file called “nvlog_output.txt” (created at the same location). After reproducing the error, you can share the “nvlog_output.txt” file here so we can investigate its content.
Worth noting that DL Designer expects ONNX external data to be referenced using a relative path from the ONNX model location, and that external data files being referenced must be accessible (permissions-wise) by the user account running DL Designer.
nvlog.config.txt (197 Bytes)
So, I tried doing what you suggested but this time it worked perfectly fine. I guess there is no issue now. Nothing has changed between yesterday and today. This may as well be windows doing windows things. The other features also work properly like model sanitization and conversion to fp16. If I get to reproduce the error again I will try to capture the logs.
