Memory Leak When Sleeping on TK1

Our system is putting the system to sleep every once in a while to save energy.
We have noticed that after a while the system becomes unresponsive.
After investigation we found out that the memory is leaking with every sleeping cycle.
After about 70 sleep/wakeup cycles, the system exhausts its free memory of about 1400MB (so it leaks about 20MB per sleep cycle).

I tried it on a “virgin” TK1 system with none of our software installed and I can reproduce it.
The missing memory is labeled as “used” by memory reporting utilities such as “free” and “top”, but nowhere else (not reported in cached, buffers, shared).

You can see the memory consumption pattern here: https://docs.google.com/spreadsheets/d/1i_freQ2b1dtkpmAsew_1itfNT7E8FZXmSxd5ZMglDus/edit#gid=1712061298

To put the system to sleep for 10 seconds I’m using this script:

echo +10 > /sys/class/rtc/rtc0/wakealarm; echo lp0 > /sys/power/suspend/mode; echo mem > /sys/power/state

Note that when summing up all the resident memory for all processes for all users, the missing memory is not accounted for. I used slabtop to check kernel buffers, and it is not there either.

Questions:

  1. Any idea how to figure out where that memory is going? In general, summing up resident memory for all processes does not add up to total used memory (minus cached). There is a gap that widens as memory consumption gets higher, regarding of sleep
  2. Any idea how to reclaim that leaked memory or how to prevent the leak?

I haven’t used this, but it looks like the default kernel needs CONFIG_DEBUG_KMEMLEAK enabled for the following (after which the correct files in “/sys/kernel/debug/kmemleak” should show up). See:
https://www.kernel.org/doc/html/v4.10/dev-tools/kmemleak.html

I have verified that the very same scenario does not cause memory leaks on stock ubuntu 14.04 running on two separate machines. Seems like this is an issue with L4T. Still getting in our way since sleeping is part of our strategy for saving battery and without it we cannot run for long on battery power.

Just for reference, is this running L4T R21.5 (you can use “head -n 1 /etc/nv_tegra_release” to see). I am wondering if it would be possible to create a similar spreadsheet, but showing the content of “/proc/meminfo”? This would have a lot of columns.

Here’s the spreadsheet as requested with full /proc/meminfo: https://docs.google.com/spreadsheets/d/16dkvYNNAvlJRdSYojCX6HZBELLyuLmJdci6ece5XQ4U/edit?usp=sharing

This represents a sampling of /proc/meminfo every second, going through 100 sleep cycles, each cycle sleeping 10 seconds, with 6 seconds intervals between sleeps.

/etc/nvidia_tegra_release reads:

# R21 (release), REVISION: 5.0, GCID: 7273100, BOARD: ardbeg, EABI: hard, DATE: Wed Jun  8 04:19:09 UTC 2016

The code I used to test this:

#!/bin/bash
SLEEP_DURATION=${1:-10}
BETWEEN_SLEEPS_INTERVAL=${2:-6}
SLEEP_MODE=${3:-1}
NCYCLES=${4:-100}
echo "starting..."
for i in $(seq 1 $NCYCLES) 
do
   echo "sleeping at "`date +"%T"`
   SECONDS=0
   sudo ./sleep $SLEEP_DURATION $SLEEP_MODE
   duration=$SECONDS
   echo "waking up after ${SECONDS} seconds at " `date +"%T"`
   echo "in step ${i}"
   sleep $BETWEEN_SLEEPS_INTERVAL
done
echo "done."

and ./sleep is:

SLEEP_TIME=${1:-20}
SLEEP_MODE=${2:-0}
echo +${SLEEP_TIME} > /sys/class/rtc/rtc0/wakealarm; 
echo lp${SLEEP_MODE} > /sys/power/suspend/mode; 
echo mem > /sys/power/state

What I did was import the CSV into LibreOffice, then I found a row which was approximately where memory started dropping…I gave it a bright background color so I could know where it was when scrolling fast. I then did a simple test of scrolling up/down and finding columns which did not appear to change much at the same time when MemFree started going down…I marked those as “hide column”.

As MemFree drops, so does HighFree. No other meminfo column dropped at a rate similar to MemFree at the same moment that MemFree drops. NvMapMemUsed goes up. No other column goes up, many remain constant until memory is mostly gone. So it looks like NvMapMemUsed marks the memory leak. Whatever the leak detail is, HighFree is probably where most of the memory is being consumed as NvMapMemUsed takes ownership.

Can someone answer what happens with NvMapMemUsed during sleep and restore?