Why not employ asynchronous techniques in deep learning models?

I executed nsight system for a PyTorch CNN (such as invoking a model like ResNet) and observed the absence of asynchronous usage in their code. Is this omission relevant to the model’s robustness?

Hi @CisMine ,
Apologies for the delay.
Can you please share more details about the issue?
Thanks

Hi, actually this is the code:

import torch
import torch.nn as nn
import torchvision.models as models

device = ‘cuda:0’
model = models.resnet18().to(device)
data = torch.randn(64, 3, 224, 224, device=device)
target = torch.randint(0, 1000, (64,), device=device)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)

nb_iters = 1
warmup_iters = 5
for i in range(nb_iters):
optimizer.zero_grad()

# start profiling after 10 warmup iterations
if i == warmup_iters: torch.cuda.cudart().cudaProfilerStart()

# push range for current iteration
if i >= warmup_iters: torch.cuda.nvtx.range_push("iteration{}".format(i))

# push range for forward
if i >= warmup_iters: torch.cuda.nvtx.range_push("forward")
output = model(data)
if i >= warmup_iters: torch.cuda.nvtx.range_pop()

loss = criterion(output, target)

if i >= warmup_iters: torch.cuda.nvtx.range_push("backward")
loss.backward()
if i >= warmup_iters: torch.cuda.nvtx.range_pop()

if i >= warmup_iters: torch.cuda.nvtx.range_push("opt.step()")
optimizer.step()
if i >= warmup_iters: torch.cuda.nvtx.range_pop()

# pop iteration range
if i >= warmup_iters: torch.cuda.nvtx.range_pop()

torch.cuda.cudart().cudaProfilerStop()

I ran a check in Nsight System to investigate the situation and observed that this code does not appear to employ the Asynchronous (overlapping data transfer and kernel computation) technique. Is there a specific reason why we are not utilizing it?

Hi @CisMine ,
you may get better help on any of the deep learning forums around the same. We suggest you to raise it on Pytorch Forum.

Thanks