Nvidia TensorRT PyTorch Docker :: Resnet50 Model running issues :: Jupyter

Dear Team,

I have setup a docker and created a container by following below steps
$ sudo git clone GitHub - pytorch/TensorRT: PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
$ cd Torch-TensorRT
$ sudo docker build -t torch_tensorrt -f ./docker/Dockerfile .
$ sudo docker run --gpus=all --rm -it -v $PWD:/Torch-TensorRT --net=host --ipc=host --ulimit memlock=-1 –
ulimit stack=67108864 torch_tensorrt:latest bash
$ cd /Torch-TensorRT/notebooks
$ jupyter notebook --allow-root --ip 0.0.0.0 --port 8888

I tried to run the model using Jupyter notebook( [Resnet50-example.ipynb]) but facing several issues as listed below :
Error marked as BOLD

Model Description:
This ResNet-50 model is based on the Deep Residual Learning for Image Recognition paper, which describes ResNet as “a method for detecting objects in images using a single deep neural network". The input size is fixed to 32x32.

alt

  1. Running the model without optimizations
    PyTorch has a model repository called timm, which is a source for high quality implementations of computer vision models. We can get our EfficientNet model from there pretrained on ImageNet.

import torch
import torchvision

torch.hub._validate_not_a_forked_repo=lambda a,b,c: True

resnet50_model = torch.hub.load(‘pytorch/vision:v0.10.0’, ‘resnet50’, pretrained=True)
resnet50_model.eval()
Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0
ResNet(
(conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
(layer1): Sequential(
(0): Bottleneck(
(conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Bottleneck(
(conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
)
(layer2): Sequential(
(0): Bottleneck(
(conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Bottleneck(
(conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(3): Bottleneck(
(conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
)
(layer3): Sequential(
(0): Bottleneck(
(conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(3): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(4): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(5): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
)
(layer4): Sequential(
(0): Bottleneck(
(conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Bottleneck(
(conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
)
(avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
(fc): Linear(in_features=2048, out_features=1000, bias=True)
)
With our model loaded, let’s proceed to downloading some images!

!mkdir -p ./data
!wget -O ./data/img0.JPG “https://d17fnq9dkz9hgj.cloudfront.net/breed-uploads/2018/08/siberian-husky-detail.jpg?bust=1535566590&width=630
!wget -O ./data/img1.JPG “https://www.hakaimagazine.com/wp-content/uploads/header-gulf-birds.jpg
!wget -O ./data/img2.JPG “https://www.artis.nl/media/filer_public_thumbnails/filer_public/00/f1/00f1b6db-fbed-4fef-9ab0-84e944ff11f8/chimpansee_amber_r_1920x1080.jpg__1920x1080_q85_subject_location-923%2C365_subsampling-2.jpg
!wget -O ./data/img3.JPG “https://www.familyhandyman.com/wp-content/uploads/2018/09/How-to-Avoid-Snakes-Slithering-Up-Your-Toilet-shutterstock_780480850.jpg

!wget -O ./data/imagenet_class_index.json “https://s3.amazonaws.com/deep-learning-models/image-models/imagenet_class_index.json
–2022-05-27 04:43:58-- https://d17fnq9dkz9hgj.cloudfront.net/breed-uploads/2018/08/siberian-husky-detail.jpg?bust=1535566590&width=630
Resolving d17fnq9dkz9hgj.cloudfront.net (d17fnq9dkz9hgj.cloudfront.net)… 18.66.40.187, 18.66.40.52, 18.66.40.216, …
Connecting to d17fnq9dkz9hgj.cloudfront.net (d17fnq9dkz9hgj.cloudfront.net)|18.66.40.187|:443… connected.
HTTP request sent, awaiting response… 200 OK
Length: 24112 (24K) [image/jpeg]
Saving to: ‘./data/img0.JPG’

./data/img0.JPG 100%[===================>] 23.55K --.-KB/s in 0.006s

2022-05-27 04:43:59 (3.98 MB/s) - ‘./data/img0.JPG’ saved [24112/24112]

–2022-05-27 04:43:59-- https://www.hakaimagazine.com/wp-content/uploads/header-gulf-birds.jpg
Resolving www.hakaimagazine.com (www.hakaimagazine.com)… 164.92.73.117, 64:ff9b::a45c:4975
Connecting to www.hakaimagazine.com (www.hakaimagazine.com)|164.92.73.117|:443… connected.
HTTP request sent, awaiting response… 200 OK
Length: 452718 (442K) [image/jpeg]
Saving to: ‘./data/img1.JPG’

./data/img1.JPG 100%[===================>] 442.11K 425KB/s in 1.0s

2022-05-27 04:44:02 (425 KB/s) - ‘./data/img1.JPG’ saved [452718/452718]

–2022-05-27 04:44:02-- https://www.artis.nl/media/filer_public_thumbnails/filer_public/00/f1/00f1b6db-fbed-4fef-9ab0-84e944ff11f8/chimpansee_amber_r_1920x1080.jpg__1920x1080_q85_subject_location-923%2C365_subsampling-2.jpg
Resolving www.artis.nl (www.artis.nl)… 94.75.225.20
Connecting to www.artis.nl (www.artis.nl)|94.75.225.20|:443… connected.
HTTP request sent, awaiting response… 200 OK
Length: 361413 (353K) [image/jpeg]
Saving to: ‘./data/img2.JPG’

./data/img2.JPG 100%[===================>] 352.94K 334KB/s in 1.1s

2022-05-27 04:44:05 (334 KB/s) - ‘./data/img2.JPG’ saved [361413/361413]

–2022-05-27 04:44:05-- https://www.familyhandyman.com/wp-content/uploads/2018/09/How-to-Avoid-Snakes-Slithering-Up-Your-Toilet-shutterstock_780480850.jpg
Resolving www.familyhandyman.com (www.familyhandyman.com)… 104.18.202.107, 104.18.201.107, 2606:4700::6812:ca6b, …
Connecting to www.familyhandyman.com (www.familyhandyman.com)|104.18.202.107|:443… connected.
HTTP request sent, awaiting response… 200 OK
Length: 90994 (89K) [image/jpeg]
Saving to: ‘./data/img3.JPG’

./data/img3.JPG 100%[===================>] 88.86K --.-KB/s in 0.03s

2022-05-27 04:44:05 (2.71 MB/s) - ‘./data/img3.JPG’ saved [90994/90994]

–2022-05-27 04:44:06-- https://s3.amazonaws.com/deep-learning-models/image-models/imagenet_class_index.json
Resolving s3.amazonaws.com (s3.amazonaws.com)… 52.217.172.24, 64:ff9b::34d8:1c86
Connecting to s3.amazonaws.com (s3.amazonaws.com)|52.217.172.24|:443… connected.
HTTP request sent, awaiting response… 200 OK
Length: 35363 (35K) [application/octet-stream]
Saving to: ‘./data/imagenet_class_index.json’

./data/imagenet_cla 100%[===================>] 34.53K 102KB/s in 0.3s

2022-05-27 04:44:07 (102 KB/s) - ‘./data/imagenet_class_index.json’ saved [35363/35363]

All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape (3 x H x W), where H and W are expected to be at least 224. The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225].

Here’s a sample execution.

from PIL import Image
from torchvision import transforms
import matplotlib.pyplot as plt
import json

fig, axes = plt.subplots(nrows=2, ncols=2)

for i in range(4):
img_path = ‘./data/img%d.JPG’%i
img = Image.open(img_path)
preprocess = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
input_tensor = preprocess(img)
plt.subplot(2,2,i+1)
plt.imshow(img)
plt.axis(‘off’)

loading labels

with open(“./data/imagenet_class_index.json”) as json_file:
d = json.load(json_file)

Throughout this tutorial, we will be making use of some utility functions; rn50_preprocess for preprocessing input images, predict to use the model for prediction and benchmark to benchmark the inference. You do not need to understand/go through these utilities to make use of Torch TensorRT, but are welecomed to do so if you choose.

import numpy as np
import time
import torch.backends.cudnn as cudnn
cudnn.benchmark = True

def rn50_preprocess():
preprocess = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
return preprocess

decode the results into ([predicted class, description], probability)

def predict(img_path, model):
img = Image.open(img_path)
preprocess = rn50_preprocess()
input_tensor = preprocess(img)
input_batch = input_tensor.unsqueeze(0) # create a mini-batch as expected by the model

# move the input and model to GPU for speed if available
if torch.cuda.is_available():
    input_batch = input_batch.to('cuda')
    model.to('cuda')


with torch.no_grad():
output = model(input_batch)
# Tensor of shape 1000, with confidence scores over Imagenet’s 1000 classes
sm_output = torch.nn.functional.softmax(output[0], dim=0)

ind = torch.argmax(sm_output)
return d[str(ind.item())], sm_output[ind] #([predicted class, description], probability)


def benchmark(model, input_shape=(1024, 1, 224, 224), dtype=‘fp32’, nwarmup=50, nruns=10000):
input_data = torch.randn(input_shape)
input_data = input_data.to(“cuda”)
if dtype==‘fp16’:
input_data = input_data.half()

print("Warm up ...")
with torch.no_grad():
    for _ in range(nwarmup):
        features = model(input_data)
torch.cuda.synchronize()
print("Start timing ...")
timings = []
with torch.no_grad():
    for i in range(1, nruns+1):
        start_time = time.time()
        features = model(input_data)
        torch.cuda.synchronize()
        end_time = time.time()
        timings.append(end_time - start_time)
        if i%10==0:
            print('Iteration %d/%d, ave batch time %.2f ms'%(i, nruns, np.mean(timings)*1000))


print(“Input shape:”, input_data.size())
print(“Output features size:”, features.size())
print(‘Average batch time: %.2f ms’%(np.mean(timings)*1000))
With the model downloaded and the util functions written, let’s just quickly see some predictions, and benchmark the model in its current un-optimized state.

for i in range(4):
img_path = ‘./data/img%d.JPG’%i
img = Image.open(img_path)

pred, prob = predict(img_path, resnet50_model)
print('{} - Predicted: {}, Probablility: {}'.format(img_path, pred, prob))


plt.subplot(2,2,i+1)
plt.imshow(img);
plt.axis(‘off’);
plt.title(pred[1])
/usr/local/lib/python3.8/dist-packages/torch/cuda/init.py:80: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at …/c10/cuda/CUDAFunctions.cpp:112.)
** return torch._C._cuda_getDeviceCount() > 0**
./data/img0.JPG - Predicted: [‘n02110185’, ‘Siberian_husky’], Probablility: 0.49787387251853943
./data/img1.JPG - Predicted: [‘n01820546’, ‘lorikeet’], Probablility: 0.6446995735168457
./data/img2.JPG - Predicted: [‘n02481823’, ‘chimpanzee’], Probablility: 0.9899842739105225
./data/img3.JPG - Predicted: [‘n01749939’, ‘green_mamba’], Probablility: 0.4564127027988434

Model benchmark without Torch-TensorRT

model = resnet50_model.eval().to(“cuda”)
benchmark(model, input_shape=(128, 3, 224, 224), nruns=100)

RuntimeError Traceback (most recent call last)
Input In [8], in <cell line: 2>()
** 1 # Model benchmark without Torch-TensorRT**
----> 2 model = resnet50_model.eval().to(“cuda”)
** 3 benchmark(model, input_shape=(128, 3, 224, 224), nruns=100)**

File /usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py:899, in Module.to(self, args, kwargs)
** 895 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None,

** 896 non_blocking, memory_format=convert_to_format)
*
** 897 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)**
→ 899 return self._apply(convert)

File /usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py:570, in Module._apply(self, fn)
** 568 def _apply(self, fn):**
** 569 for module in self.children():**
→ 570 module._apply(fn)
** 572 def compute_should_use_set_data(tensor, tensor_applied):**
** 573 if torch._has_compatible_shallow_copy_type(tensor, tensor_applied):**
** 574 # If the new tensor has compatible tensor type as the existing tensor,**
** 575 # the current behavior is to change the tensor in-place using .data =,**
** (…)**
** 580 # global flag to let the user control whether they want the future**
** 581 # behavior of overwriting the existing tensor or not.**

File /usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py:593, in Module._apply(self, fn)
** 589 # Tensors stored in modules are graph leaves, and we don’t want to**
** 590 # track autograd history of param_applied, so we have to use**
** 591 # with torch.no_grad():**
** 592 with torch.no_grad():**
→ 593 param_applied = fn(param)
** 594 should_use_set_data = compute_should_use_set_data(param, param_applied)**
** 595 if should_use_set_data:**

File /usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py:897, in Module.to..convert(t)
** 894 if convert_to_format is not None and t.dim() in (4, 5):**
** 895 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None,**
** 896 non_blocking, memory_format=convert_to_format)**
→ 897 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)

File /usr/local/lib/python3.8/dist-packages/torch/cuda/init.py:214, in _lazy_init()
** 210 raise AssertionError(**
** 211 “libcudart functions unavailable. It looks like you have a broken build?”)**
** 212 # This function throws if there’s a driver initialization error, no GPUs**
** 213 # are found or any other error occurs**
→ 214 torch._C._cuda_init()
** 215 # Some of the queued calls may reentrantly call _lazy_init();**
** 216 # we need to just return without initializing in that case.**
** 217 # However, we must not let any other threads in!**
** 218 _tls.is_initializing = True**

RuntimeError: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero.

Kindly help me to resolve the same.

Thanks and Regards,
Vyom Mishra

Hi there Vyom!

I am sorry to hear you are running into problems with TensorRT.

Did you try and act on the main Error that is repeated above?
" CUDA unknown error - this may be due to an incorrectly set up environment"
Did you check your setup, verified driver installation and similar?

I will move your post to the correct category on our forums which is dedicated to TensorRT.

What would help others to understand your issue better is:

  • Try to put your log outputs like the ones above into code blocks. Right now your post is very long and hard to read
  • Please share details on your setup, like GPU type, OS, driver version
  • Share output of the command nvidia-smi
  • If you are on Linux, please run nvidia-bug-report.sh and attach the output please

Thank you!

1 Like

Dear MarkusHoho,

Thanks for the reply!

I have pulled the pre-build libraries from the NGC(PyTorch | NVIDIA NGC) Version 22.04-py3

GPU Type: Quadro P2000
OS : Ubuntu 18.04
nvidia-smi:
root@65002d456bdc:/workspace# nvidia-smi
Mon May 30 03:06:21 2022
±----------------------------------------------------------------------------+
| NVIDIA-SMI 510.73.05 Driver Version: 510.73.05 CUDA Version: 11.6 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro P2000 Off | 00000000:01:00.0 Off | N/A |
| N/A 35C P5 N/A / N/A | 3362MiB / 4096MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
±----------------------------------------------------------------------------+

Error faced is :
RuntimeError: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero

Can you please let me know the standard procedure to run any model with tensorrt runtime over PC(Linux) GPUs using docker environment for pytorch.

Thanks and Regards,
Vyom Mishra

Hi,

There are known issues similar to this error. Looks like there is some problem with your CUDA setup and driver setup. Please set up the driver correctly and restart the machine and try again as mentioned in the following issues.
If you still facing this issue, please share with us the steps you’ve tried and the error logs.

Thank you.