Description
I am trying to quantize and run a model via Apache TVM with the TensorRT backend and int8 calibration. A call to cudaMemcpy() in the TensorRTCalibrator causes an error “CUDA invalid image”, which I think happens, because of varying batchsizes/input dimensions.
For debugging, I’ve set the calibration runs to 5 via environment variable. In these 5 runs, batchsize and input dimension match the model’s input dimensions. After finishing the calibration, I am trying to invoke the, now calibrated, inference again with an input as specified in the model (by calling module.set_input() and module.run() in TVM) and now cudaMemcpy() fails. With print debugging, I’ve learnt, that the buffer size is computed from wrong values for the input tensor’s dimensions.
Apparently, cudaMemcpy() is trying to copy more data than what is actually expected and fails.
I would be happy about any help!
I don’t know if this help but here is TVM’s error output:
terminate called after throwing an instance of ‘tvm::runtime::InternalError’
what(): [22:37:29] …/06_tvm_benchmarking/tvm_benchmarking/tvm/src/runtime/contrib/tensorrt/tensorrt_calibrator.h:81: InternalError: Check failed: (e == cudaSuccess || e == cudaErrorCudartUnloading) is false: CUDA: invalid argument
Stack trace:
0: tvm::runtime::TensorRTCalibrator::getBatch(void**, char const**, int)
1: 0x00007fe73e18c7f4
2: 0x00007fe73e2f7126
3: 0x00007fe73e1564e5
4: 0x00007fe73e15b4ee
5: 0x00007fe73e15be20
6: tvm::runtime::contrib::TensorRTBuilder::BuildEngine()
7: tvm::runtime::contrib::TensorRTRuntime::BuildEngineFromJson(int)
8: tvm::runtime::contrib::TensorRTRuntime::GetOrBuildEngine()
9: tvm::runtime::contrib::TensorRTRuntime::Run()
10: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::json::JSONRuntimeBase::GetFunction(tvm::runtime::String const&, tvm::runtime::ObjectPtrtvm::runtime::Object const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#3}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
11: std::_Function_handler<void (), tvm::runtime::GraphExecutor::CreateTVMOp(tvm::runtime::TVMOpParam const&, std::vector<DLTensor*, std::allocator<DLTensor*> > const&)::{lambda()#3}>::_M_invoke(std::_Any_data const&)
12: tvm::runtime::GraphExecutor::Run()
13: tvm::runtime::LocalSession::CallFunc(void*, TVMValue const*, int const*, int, std::function<void (tvm::runtime::TVMArgs)> const&)
14: tvm::runtime::RPCWrappedFunc::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
Aborted (core dumped)
Environment
TensorRT Version: 8.6.1.6
GPU Type: A100
Nvidia Driver Version:
CUDA Version: 11.8
CUDNN Version: 8.9.0
Operating System + Version: Ubuntu 20.04
Python Version (if applicable): 3.8.10
TensorFlow Version (if applicable): 2.13.1
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):
Relevant Files
Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)
Steps To Reproduce
Please include:
- Exact steps/commands to build your repro
- Exact steps/commands to run your repro
- Full traceback of errors encountered