Model run on DLA,but most time is"XX from nvm"

I’m trying to understand what takes most of the time in a network running on the DLA. And I Found most time is “XX from nvm”
For Example, trtexec resnet50 and profile result(trtexec --deploy=resnet50.prototxt --fp16 --output=pool1 --dumpProfile --useDLACore=0, so model all layer on DLA without GPU):

Q1: why most time is “XX from nvm”, what is nvm
Q2: what can i do to decrease this cost


nvm is just to show where the data is.
In general, the time stands for the pooling layer with DLA.