Error during inference

e migrated our pytorch model to trt model. When running inference using newly generated model, we are getting below error

12:50

[TensorRT] ERROR: …/rtSafe/cuda/caskConvolutionRunner.cpp (373) - Cask Error in checkCaskExecError: 7 (Cask Convolution execution)
[TensorRT] ERROR: FAILED_EXECUTION: std::exception
Traceback (most recent call last):
File “pose_video.py”, line 555, in
paf_info, heatmap_info = get_paf_and_heatmap(model_pose, img_res, scale_param)
File “/documents/trt/trt_pose/tasks/human_pose/migration/pose_estimation.py”, line 175, in get_paf_and_heatmap
heatmap = nn.UpsamplingBilinear2d((img_raw.shape[0], img_raw.shape[1]))(output2)
File “/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py”, line 722, in _call_impl
result = self.forward(*input, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/torch/nn/modules/upsampling.py”, line 141, in forward
return F.interpolate(input, self.size, self.scale_factor, self.mode, self.align_corners)
File “/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py”, line 3163, in interpolate
return torch._C._nn.upsample_bilinear2d(input, output_size, align_corners, sfl[0], sfl[1])
RuntimeError: CUDA error: unspecified launch failure

It seems something to do with how we invoke trtmode on image. Can someone please share example code of invoking trt model on image for inference

Hi,
Can you try running your model with trtexec command, and share the “”–verbose"" log in case if the issue persist
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

You can refer below link for all the supported operators list, in case any operator is not supported you need to create a custom plugin to support that operation

Also, request you to share your model and script if not shared already so that we can help you better.

Thanks!

Thanks @NVES for quick response.

I have tried following steps mentioned in the GH link you had shared. Steps mentioned seems to be wrong as there’s no folder exists
cd /samples/trtexec

I have tried navigating to /samples/opensource/trtexec, and when I run make command, I got below error
make: *** No targets specified and no makefile found. Stop.

I am able to migrate model to TensorRT by following the GH link

However, I have used my own pose estimation model. I used below code for inference

import torch2trt
from torch2trt import TRTModule

model_pose = TRTModule()
model_pose.load_state_dict(torch.load('coco_pose.trt'))
model_pose.float()
model_pose.eval()

   img_test = cv2.resize(img_raw, (0, 0), fx=scale, fy=scale, interpolation=cv2.INTER_CUBIC)
    img_test_pad, pad = pad_right_down_corner(img_test, param_stride, param_stride)
    img_test_pad = np.transpose(np.float16(img_test_pad[:, :, :, np.newaxis]), (3, 2, 0, 1)) / 256 - 0.5

    feed = Variable(torch.from_numpy(img_test_pad))
    print(model)
    output1, output2 = model(feed)
    output1, output2 = output1.detach().cpu(), output2.detach().cpu()

During execution i am getting below error
File “/documents/trt/trt_pose/tasks/human_pose/migration/pose_estimation.py”, line 176, in get_paf_and_heatmap
output1, output2 = output1.detach().cpu(), output2.detach().cpu()
RuntimeError: CUDA error: unspecified launch failure

@karunakar.r ,

Which model are you using? Is it a pytorch model or tensorrt convert model?

What is the batchsize? Try reducing it. Its a pytorch error occurs when GPU memory is not sufficient to take the model at the particular batchsize: RuntimeError: CUDA error: unspecified launch failure · Issue #31702 · pytorch/pytorch · GitHub

Thanks @bgiddwani
We have tried setting max batch size to 2GB, but still getting the same error
TVM_TENSORRT_MAX_WORKSPACE_SIZE=22147483647 python3 pose_video.py

Hi @karunakar.r,

Are you still facing this issue.

Hi,
The issue is resolved once we reduced the batch size and correctly passing the image with correct size.

Thanks
Karunakar