I’m trying to profile inferences of a tiny model with dlprof, but i can’t seem to capture iteration information when i let it run for multiple iterations, this is what the code does

```
class SmallModel(nn.Module):
def __init__(self):
super(SmallModel, self).__init__()
self.layer1 = nn.Linear(784, 512)
self.layer2 = nn.Linear(512, 256)
def forward(self, x):
x = torch.relu(self.layer1(x))
x = torch.relu(self.layer2(x))
return x
```

```
model = SmallModel().cuda().half()
input_data = torch.randn(64, 784).cuda().half()
nvidia_dlprof_pytorch_nvtx.init(enable_function_stack=True)
parser = argparse.ArgumentParser("Nvidia Profiler")
parser.add_argument("--num_iter", dest='num_iter', help="no of iterations to perform", type=int)
args = parser.parse_args()
```

```
with torch.no_grad():
with torch.autograd.profiler.emit_nvtx():
for i in range(args.num_iter):
_ = model(input_data)
```

this is the command i’m running → **dlprof --mode=pytorch --key_node=LINEAR_1 -f true --reports=summary,detail,iteration --iter_start=5 --iter_stop=8 python profile_sample_model.py --num_iter 10**

this is what the dlprof log generates:

**Found 2 iterations using key_op “LINEAR_1”**

**Iterations: [12495162999, 12520617892]**

**Aggregating data over 1 iterations: iteration 1 start (12495162999 ns) to iteration 1 end (12520617892 ns)**

i want dlprof to capture from iter 5 to iter 8 independently, instead it skips aggregation until the first instance it encounters the specified key_node and then aggregates the rest of the 9 iterations as a one iteration, what am i doing wrong here, --iter_start=5 --iter_stop=8 doesn’t seem to have any effect, really appreciate any guidance on this