HRNet (and many others) fail on optimization

Description

I am going from tensorflow 2.2 keras to onnx with keras2onnx and then try to parse the onnx file.
I narrowed the error down to the following layer type:
The HRNet Fuse layer between stages uses Add to combine the upsampled feature maps with the higher resolution feature maps. Replacing those layers with Concat + 1x1Convs allows correct TensorRT optimization.
TensorRT version is for CUDA 11.0

The exact error here happens after upsamling the lower res branches to the highes resolution (240,) and adding all 3 together.
Error Message seems broken since the library is loaded correctly and the model works when replacing the mentioned layers

*************** Autotuning format combination: Float(1,240,32640,587520), Float(1,240,32640,587520), Float(1,240,32640,587520) -> Float(1,240,32640,587520) ***************
nvrtc: error: failed to open libnvrtc-builtins.so.
  Make sure that libnvrtc-builtins.so is installed correctly.
terminate called after throwing an instance of 'pwgen::PwgenException'
  what():  NVRTC error:

Environment

TensorRT Version: 7.2
GPU Type: RTX 2080
Nvidia Driver Version: 455.45.01
CUDA Version: 11.0
CUDNN Version: 8.0
Operating System + Version: Ubuntu 20.04
Python Version (if applicable): 3.8.5
TensorFlow Version (if applicable): 2.2
PyTorch Version (if applicable):

Hi @andre.a.bauer,

This looks like CUDA installation issue.
Please follow the below link for proper installation.

Thank you.