TensorRT+TVM+INT8-Quantization fails

Description

I am trying to quantize and run a model via Apache TVM with the TensorRT backend and int8 calibration. A call to cudaMemcpy() in the TensorRTCalibrator causes an error “CUDA invalid image”, which I think happens, because of varying batchsizes/input dimensions.

For debugging, I’ve set the calibration runs to 5 via environment variable. In these 5 runs, batchsize and input dimension match the model’s input dimensions. After finishing the calibration, I am trying to invoke the, now calibrated, inference again with an input as specified in the model (by calling module.set_input() and module.run() in TVM) and now cudaMemcpy() fails. With print debugging, I’ve learnt, that the buffer size is computed from wrong values for the input tensor’s dimensions.

Apparently, cudaMemcpy() is trying to copy more data than what is actually expected and fails.

I would be happy about any help!

I don’t know if this help but here is TVM’s error output:
terminate called after throwing an instance of ‘tvm::runtime::InternalError’
what(): [22:37:29] …/06_tvm_benchmarking/tvm_benchmarking/tvm/src/runtime/contrib/tensorrt/tensorrt_calibrator.h:81: InternalError: Check failed: (e == cudaSuccess || e == cudaErrorCudartUnloading) is false: CUDA: invalid argument
Stack trace:
0: tvm::runtime::TensorRTCalibrator::getBatch(void**, char const**, int)
1: 0x00007fe73e18c7f4
2: 0x00007fe73e2f7126
3: 0x00007fe73e1564e5
4: 0x00007fe73e15b4ee
5: 0x00007fe73e15be20
6: tvm::runtime::contrib::TensorRTBuilder::BuildEngine()
7: tvm::runtime::contrib::TensorRTRuntime::BuildEngineFromJson(int)
8: tvm::runtime::contrib::TensorRTRuntime::GetOrBuildEngine()
9: tvm::runtime::contrib::TensorRTRuntime::Run()
10: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::json::JSONRuntimeBase::GetFunction(tvm::runtime::String const&, tvm::runtime::ObjectPtrtvm::runtime::Object const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#3}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
11: std::_Function_handler<void (), tvm::runtime::GraphExecutor::CreateTVMOp(tvm::runtime::TVMOpParam const&, std::vector<DLTensor*, std::allocator<DLTensor*> > const&)::{lambda()#3}>::_M_invoke(std::_Any_data const&)
12: tvm::runtime::GraphExecutor::Run()
13: tvm::runtime::LocalSession::CallFunc(void*, TVMValue const*, int const*, int, std::function<void (tvm::runtime::TVMArgs)> const&)
14: tvm::runtime::RPCWrappedFunc::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const

Aborted (core dumped)

Environment

TensorRT Version: 8.6.1.6
GPU Type: A100
Nvidia Driver Version:
CUDA Version: 11.8
CUDNN Version: 8.9.0
Operating System + Version: Ubuntu 20.04
Python Version (if applicable): 3.8.10
TensorFlow Version (if applicable): 2.13.1
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

Hi,
Can you try running your model with trtexec command, and share the “”–verbose"" log in case if the issue persist

You can refer below link for all the supported operators list, in case any operator is not supported you need to create a custom plugin to support that operation

Also, request you to share your model and script if not shared already so that we can help you better.

Meanwhile, for some common errors and queries please refer to below link:

Thanks!