When I run this slightly modified script of yours on Nano using l4t-pytorch:r32.5.0-pth1.7-py3
container:
import torch
from torch import nn
import torchvision
import time
torch.cuda.empty_cache()
DEVICE = torch.device('cuda')
print(DEVICE, torch.__version__, sep=" | ")
model = torchvision.models.resnet18(pretrained=True).to(DEVICE)
model.eval()
inp = torch.rand(1, 3, 224, 224).to(DEVICE)
for i in range(50):
start = time.time()
out = model(inp)
stop = time.time() - start
print(out.shape, stop, sep=" ")
this is what I get:
cuda | 1.7.0
torch.Size([1, 1000]) 6.532195329666138
torch.Size([1, 1000]) 0.04456067085266113
torch.Size([1, 1000]) 0.028676748275756836
torch.Size([1, 1000]) 0.032257795333862305
torch.Size([1, 1000]) 0.03734779357910156
torch.Size([1, 1000]) 0.021605968475341797
torch.Size([1, 1000]) 0.02182316780090332
torch.Size([1, 1000]) 0.022113561630249023
torch.Size([1, 1000]) 0.021986007690429688
torch.Size([1, 1000]) 0.02157902717590332
torch.Size([1, 1000]) 0.02269721031188965
torch.Size([1, 1000]) 0.02176690101623535
torch.Size([1, 1000]) 0.021875619888305664
torch.Size([1, 1000]) 0.021648883819580078
torch.Size([1, 1000]) 0.021985530853271484
torch.Size([1, 1000]) 0.021978378295898438
torch.Size([1, 1000]) 0.023291587829589844
torch.Size([1, 1000]) 0.02150726318359375
torch.Size([1, 1000]) 0.022224903106689453
torch.Size([1, 1000]) 0.021436214447021484
torch.Size([1, 1000]) 0.02248215675354004
torch.Size([1, 1000]) 0.021712541580200195
torch.Size([1, 1000]) 0.021942615509033203
torch.Size([1, 1000]) 0.02127361297607422
torch.Size([1, 1000]) 0.02282261848449707
torch.Size([1, 1000]) 0.0217134952545166
torch.Size([1, 1000]) 0.021761655807495117
torch.Size([1, 1000]) 0.021343708038330078
torch.Size([1, 1000]) 0.02223944664001465
torch.Size([1, 1000]) 0.022092103958129883
torch.Size([1, 1000]) 0.03938460350036621
torch.Size([1, 1000]) 0.034061431884765625
torch.Size([1, 1000]) 0.03392529487609863
torch.Size([1, 1000]) 0.03402352333068848
torch.Size([1, 1000]) 0.0338597297668457
torch.Size([1, 1000]) 0.03396439552307129
torch.Size([1, 1000]) 0.03467988967895508
torch.Size([1, 1000]) 0.03416728973388672
torch.Size([1, 1000]) 0.03405308723449707
torch.Size([1, 1000]) 0.03409099578857422
torch.Size([1, 1000]) 0.02503824234008789
torch.Size([1, 1000]) 0.04316258430480957
torch.Size([1, 1000]) 0.03392148017883301
torch.Size([1, 1000]) 0.03392672538757324
torch.Size([1, 1000]) 0.033841609954833984
torch.Size([1, 1000]) 0.03403735160827637
torch.Size([1, 1000]) 0.034006357192993164
torch.Size([1, 1000]) 0.0339808464050293
torch.Size([1, 1000]) 0.03402423858642578
torch.Size([1, 1000]) 0.034269094467163086
As expected, the first iteration takes longer, but then the time quickly drops. Are you using the 5W or 10W (MAX-N) power profile?
$ sudo nvpmodel -q
NVPM WARN: fan mode is not set!
NV Power Mode: MAXN
0