Building trtpose in x86 T4

Description

Goal: torch to trt conversion
Script:
data = torch.zeros((1, 3, HEIGHT, WIDTH)).cuda()
if os.path.exists(OPTIMIZED_MODEL) == False:
print(‘-- Converting TensorRT models. This may takes several minutes…’)
model.load_state_dict(torch.load(MODEL_WEIGHTS))
model_trt = torch2trt.torch2trt(model, [data], fp16_mode=True, max_workspace_size=1<<25)
torch.save(model_trt.state_dict(), OPTIMIZED_MODEL)

A clear and concise description of the bug or issue.
(1) Above works on Jetson
(2) Running the above in x86 machine with T4 GPU doesnt work. Get the following error
------ model = resnet--------
/home/user/.local/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter ‘pretrained’ is deprecated since 0.13 and may be removed in the future, please use ‘weights’ instead.
warnings.warn(
/home/user/.local/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or None for ‘weights’ are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing weights=ResNet18_Weights.IMAGENET1K_V1. You can also use weights=ResNet18_Weights.DEFAULT to get the most up-to-date weights.
warnings.warn(msg)
– Converting TensorRT models. This may takes several minutes…
HEIGHT WIDTH 224 224
input shape torch.Size([64, 3, 7, 7])
[06/18/2024-02:25:14] [TRT] [E] Error Code: 3: 1.cmap_up.0:0:DECONVOLUTION:GPU:kernel weights has count 2097152 but 4194304 was expected
[06/18/2024-02:25:14] [TRT] [E] Error Code: 4: 1.cmap_up.0:0:DECONVOLUTION:GPU: count of 2097152 weights in kernel, but kernel dimensions (4,4) with 512 input channels, 512 output channels and 1 groups were specified. Expected Weights count is 512 * 4*4 * 512 / 1 = 4194304
[06/18/2024-02:25:14] [TRT] [E] ITensor::getDimensions: Error Code 4: Internal Error (Output shape can not be computed for node 1.cmap_up.0:0:DECONVOLUTION:GPU.)
[06/18/2024-02:25:14] [TRT] [E] INetworkDefinition::addScaleNd: Error Code 3: API Usage Error (Parameter check failed, condition: qdqScale || basicScale. )
Traceback (most recent call last):
File “/home/user/pose_310/trt_pose/tasks/human_pose/video-pose-3.py”, line 228, in
model_trt = torch2trt.torch2trt(model, [data], fp16_mode=True, max_workspace_size=1<<25)
File “/home/user/.local/lib/python3.10/site-packages/torch2trt-0.5.0-py3.10-linux-x86_64.egg/torch2trt/torch2trt.py”, line 643, in torch2trt
outputs = module(*inputs)
File “/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File “/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1568, in _call_impl
result = forward_call(*args, **kwargs)
File “/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/container.py”, line 215, in forward
input = module(input)
File “/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File “/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1568, in _call_impl
result = forward_call(*args, **kwargs)
File “/home/user/.local/lib/python3.10/site-packages/trt_pose-0.0.1-py3.10-linux-x86_64.egg/trt_pose/models/common.py”, line 70, in forward
xc = self.cmap_up(x)
File “/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File “/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1568, in _call_impl
result = forward_call(*args, **kwargs)
File “/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/container.py”, line 215, in forward
input = module(input)
File “/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File “/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1568, in _call_impl
result = forward_call(*args, **kwargs)
File “/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/batchnorm.py”, line 171, in forward
return F.batch_norm(
File “/home/user/.local/lib/python3.10/site-packages/torch2trt-0.5.0-py3.10-linux-x86_64.egg/torch2trt/torch2trt.py”, line 262, in wrapper
converter"converter"
File “/home/user/.local/lib/python3.10/site-packages/torch2trt-0.5.0-py3.10-linux-x86_64.egg/torch2trt/converters/native_converters.py”, line 183, in convert_batch_norm
output._trt = layer.get_output(0)

Environment

TensorRT Version: 10.1.0
GPU Type: T4
Nvidia Driver Version: 525.125.06
CUDA Version: 12.1
CUDNN Version: 8.9
Operating System + Version: ubuntu 20.04 LTS
Python Version (if applicable): 3.10
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 2.1.2+cu121
Baremetal or Container (if container which image + tag):

Relevant Files

Model: resnet18_baseline_att_224x224_A resnet18_baseline_att_224x224_A_epoch_249.pth - Google Drive

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered
1 Like

I have the same error, i have been with this headache for several 3 days!

TensorRT Version : 8.6.1.6
GPU Type : NVIDIA A100-SXM4-40GB
Nvidia Driver Version : 535.161.07
CUDA Version : 12.2
Operating System + Version : ubuntu 22.04
Python Version (if applicable) : 3.10.12
PyTorch Version (if applicable) : 2.3.1+cu121

Did you get any solution from Nvidia Team?

I’ve just posted the same issue here: Building trtpose in x86_64 A100-SXM4-40GB

@TomNVIDIA any answer? Please We need some information about this issue. I’ve just posted this same issue here: Building trtpose in x86_64 A100-SXM4-40GB

Sorry. Still no answer.

1 Like

Correct, that’s why I tagged @TomNVIDIA who is a prominent collaborator on these topics for NVIDIA

Sorry, I am not a technical resource for TensorRT. Maybe @NVES or @spolisetty can assist here.

1 Like

Thank you very much @TomNVIDIA , please @NVES and @spolisetty we really need help with this issue, I’ve been an entire week without findind any solution, I’ve just posted this same issue here: Building trtpose in x86_64 A100-SXM4-40GB but using a A100 instead a T4 processor and using ubuntu 22.04

Hi @amaunder @mamurillo ,
I have reported this issue with Engineering and shall share the update soon.

1 Like

Thank you so much @AakankshaS ! We really need a lot of help with this issue

Hi @AakankshaS good morning, did you receive any update from Engineering about this issue?

No I didn’t

1 Like

My company is a partner of NVIDIA and the management has been asking me for results with this trt-pose model for two weeks. I honestly don’t know what to do anymore and NVIDIA doesn’t give neither a concrete answer or support to this problem, which practically doesn’t allow anything to progress and I see that I’m not the only one since @amaunder also has the same problem. I have already tried versions of random libraries, docker containers with different versions of tensorrt, etc. but it is not a banal error, literally nothing can be done to continue with the work. I’m very worried about this issue. Does the model and code work only with Jetson? Or what is the problem?

Please, we really need help with this issue dear @AakankshaS

I’ve just posted this same issue here: Building trtpose in x86_64 A100-SXM4-40GB but using a A100 instead a T4 processor and using ubuntu 22.04

Dear @AakankshaS good morning, did you receive any update from Engineering about this issue?

Hello @amaunder @mamurillo ,
They updated that this looks like a user error during conversion or a bug in torch-trt convertor.
Can you please help us with your model and repro script to be shared with Engineering.

1 Like

Hi ,
Do you experience this error even when calling torch2trt(…, use_onnx=True)?

1 Like

Will check and respond very soon. I tried from both onnx and pth but let me get the complete data and code that replicates the issue.

Thanks for following up.

1 Like

Hi, i contacted NVIDIA support directly 'cause my enterprise is a partner and they give me the solution:

First, install onnx-graphsurgeon:
pip install onnx-graphsurgeon

Then use the param use_onnx set as True in the code:
model_trt = torch2trt.torch2trt(model, [data], fp16_mode=True, max_workspace_size=1<<25,use_onnx=True)

It finally works and I have the trt.

Thank you so much.

Hi @mamurillo ,
Yes, setting torch2trt(…, use_onnx=True) seems to resolve the issue.
However we are checking with the Engineering if this is a bug or expected setting .
will update on the thread.

Thanks

1 Like

Thank you very much!

Same here. The onnx flag resolves the issue. Thanks

1 Like