The profiler encountered an out-of-memory error

I’m using latest PyTorch on windows 10 with latest CUDA 9.1 runtime and tools installed. I ran a python script by pointing File in session settings to python.exe and arguments and working directory to the python file name and directory containing python file respectively.

I hit run -> generate timeline, the python code (that uses a machine learning model to do something) runs, I see the output in the console. I let it run for a few seconds and then hit stop. I immediately get:

“Unable to profile application” profiler has encountered an out-of-memory error. The model I’m using is trained on MNIST, so not particularly big…

Any ideas??

Some more info and questions:

I was able to use the command line tool nvprof to run the python script and save the output to a .nvvp file which I was able to load in the visual profiler. However, the profiler is not very intuitive. I just want to know overall stats - for instance how much GPU memory am I using - that will tell me if I can increase the batch size or not or am I already near the capacity of the GPU. The profiler shows a timeline view, but no overall stats… I’m not a CUDA expert, and not interested in very low level details. Since I’m using deep learning libraries, I can’t really optimize the low level CUDA implementation, just interested in higher level stats. What’s the right way to approach this?

There also seems to be yet another profiler called nsight, which appears to be integrated with visual studio… Is it possible to use this one to profile python code? What are the trade-offs between this tool and the other profiling tools?

Hi, ankur6ue

Try http://docs.nvidia.com/cuda/profiler-users-guide/index.html#large-data to see if you can solve the out-of-memory error.