I executed nsight system for a PyTorch CNN (such as invoking a model like ResNet) and observed the absence of asynchronous usage in their code. Is this omission relevant to the model’s robustness?
Hi @CisMine ,
Apologies for the delay.
Can you please share more details about the issue?
Thanks
Hi, actually this is the code:
import torch
import torch.nn as nn
import torchvision.models as models
device = ‘cuda:0’
model = models.resnet18().to(device)
data = torch.randn(64, 3, 224, 224, device=device)
target = torch.randint(0, 1000, (64,), device=device)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
nb_iters = 1
warmup_iters = 5
for i in range(nb_iters):
optimizer.zero_grad()
# start profiling after 10 warmup iterations
if i == warmup_iters: torch.cuda.cudart().cudaProfilerStart()
# push range for current iteration
if i >= warmup_iters: torch.cuda.nvtx.range_push("iteration{}".format(i))
# push range for forward
if i >= warmup_iters: torch.cuda.nvtx.range_push("forward")
output = model(data)
if i >= warmup_iters: torch.cuda.nvtx.range_pop()
loss = criterion(output, target)
if i >= warmup_iters: torch.cuda.nvtx.range_push("backward")
loss.backward()
if i >= warmup_iters: torch.cuda.nvtx.range_pop()
if i >= warmup_iters: torch.cuda.nvtx.range_push("opt.step()")
optimizer.step()
if i >= warmup_iters: torch.cuda.nvtx.range_pop()
# pop iteration range
if i >= warmup_iters: torch.cuda.nvtx.range_pop()
torch.cuda.cudart().cudaProfilerStop()
I ran a check in Nsight System to investigate the situation and observed that this code does not appear to employ the Asynchronous (overlapping data transfer and kernel computation) technique. Is there a specific reason why we are not utilizing it?
Hi @CisMine ,
you may get better help on any of the deep learning forums around the same. We suggest you to raise it on Pytorch Forum.
Thanks