Hi,
We are using the Video Codec SDK to play and loop video and we have noticed that while looping, memory usage escalates and appears to be uncapped. Additionally, when closing the cuda context associated to the decode, there is a very, very minor memory leak.
Files
The files mentioned below (AppDec.cpp, watchmem and windows.csv) are found in the attached zip file:
appdec.zip (7.3 KB)
AppDec
To isolate the issues, I have made the following amendments to the AppDec.cpp sample:
- introduced a means to loop the input by way of -loop n switch (default 1)
- added a cuCtxDestroy call to avoid leaks
- introduced a means to sleep after the loops are complete with -delay ms (default 0)
- introduced a means to register multiple inputs using the existing -input switch
- introduced a means to iterate over the inputs, loops and sleeps using -iterate n (default 1)
By default, its behaviour is unchanged, but the additional switches make it easier to create long running tests which can be tracked with external profiling and other tools (such as valgrind’s memcheck and massif tools).
To build, follow the instructions in the Video_Codec_SDK directory (this is based on 12.2.72), overwriting Samples/AppDecode/AppDec/AppDec.cpp before building the samples.
watchmem
I have also attached a python script called watchmem which takes as arguments the name of the process you want to track (in this case, AppDec on linux, AppDec.exe on windows) and outputs a csv file on stdout - an example of use of this in cygwin with the windows version of python is:
$ python watchmem AppDec.exe 0.1 5.0 | tee windows.csv
Note that this will simply block until AppDec is started (it will poll for a new process every 0.1s and when found, will create a snapshot every 5 seconds).
To start the AppDec, run this in another terminal (replacing video.mp4 with a local file of course):
$ Release/AppDec -i video.mp4 -gpu 0 -o nul -loop 250 -delay 10000 -iterations 10 > /dev/null
and watchmem will produce a windows.csv file with the following type of content:
snapshot time memory average min max cpu threads
#header: Release\AppDec.exe -i video.mp4 -gpu 0 -o nul -loop 250 -delay 10000 -iterations 10
#footer: started at Wed Oct 9 19:31:53 2024
1 0.30111 0.30842 0.30842 0.30842 0.30842 0.00000 6
2 5.30339 0.31139 0.30990 0.30842 0.31139 89.70000 6
3 10.30786 0.31311 0.31097 0.30842 0.31311 92.20000 6
4 15.31287 0.31911 0.31301 0.30842 0.31911 90.60000 6
5 20.31754 0.32123 0.31465 0.30842 0.32123 93.10000 6
6 25.32295 0.32334 0.31610 0.30842 0.32334 92.20000 6
7 30.32749 0.33063 0.31818 0.30842 0.33063 92.80000 3
etc
Using the Output
After or during the run, you can render graphs from the generated windows.csv file using gnuplot (or similar tools such as numpy) to get various views of the memory consumption while the process is running - for example:
gnuplot -p -e 'set key autotitle columnhead; plot "windows.csv" using 2:6 with lines, "" using 2:3 with lines'
In this graph, we can that we see 10 periods of memory escalation (matching the 10 iterations), followed by a sharp drop where most of the memory is freed up (matching the 10,000 ms delay), but if you look closely at the max line, you can see that it is slowly increasing which indicates a very small leak (I checked our app under linux with valgrind’s memcheck and that indicated that there are leaks in libcuda - there is certainly a fixed leak too, but I’m fine with that). My main concern is the unchecked memory escalation during the repeated playout.
For this sample, the max memory use is more easily detectable in isolation:
Summary
I thought the issue worth reporting and others may find the modified AppDec sample useful for stress testing the decode functionality.
Note that the behaviour is more or less the same on linux (and I for one find it easier to test things there).
Hope that is of use of to somebody - feel free to reply if more information is required.
Cheers,
Charlie