HI,
I’m trying to understand what takes most of the time in a network running on the DLA. And I Found most time is “XX from nvm”
For Example, trtexec resnet50 and profile result(trtexec --deploy=resnet50.prototxt --fp16 --output=pool1 --dumpProfile --useDLACore=0, so model all layer on DLA without GPU):
Q1: why most time is “XX from nvm”, what is nvm
Q2: what can i do to decrease this cost