TensorRT3 deserialization issue

IRuntime->deserializeCudaEngine() call ends up in throwing an error:

terminate called after throwing an instance 'nvinfer1::CaskError'
  what():  std::excepiton
Aborted (core dumped)

There is no message explaining the issue. And I’m not able to find CaskError in the documentation, header files or any meaningfull result at google.

Does anybody has a hint about the reason of this error?

With a engine created on another host anything runs fine, this error happened after generating engines on another host which was able to perform FP16 calibration.

Dear martin.stellmacher,

The host PCs have same graphic card?


Dear SteveNV,

no, the target is a DrivePX2. Most of the time I’m using a host equipped with Titan GTX GPUs. But on this one I can’t perform FP16 import. INT8 Import seems to be not supported at all for TensorFlow. So I was performing the FP16 conversion on a host with Volta GPUs.

My task in principle is to infer a TensorFlow FP32 trained network on a DrivePX2 as INT8 network. The Python Interface isn’t supported on the target, so I was using the host. Do I have to use the C API for UFF import and calibration on the target?

Best regards,


Dear martin.stellmacher,

Could you please refer to below link and info for your topic? Thanks.


Optimizing the INT8 Model on DRIVE PX
TensorRT builder implements a profiling-based optimization called kernel autotuning. This process requires the network to be optimized on the target device. We can use the calibration cache file generated from the host in this on-target optimization phase to generate an INT8 model without requiring the calibration dataset. You need to write a calibrator class that implements the readCalibrationCache function to tell the TensorRT to use the cached result as the following code shows.

class Int8CacheCalibrator : public IInt8EntropyCalibrator {
Int8CacheCalibrator(std::string cacheFile)
: mCacheFile(cacheFile) {}
virtual ~Int8CacheCalibrator() {}

int getBatchSize() const override {return 1;}

bool getBatch(void* bindings, const char* names, int nbBindings) override {
return false;

const void* readCalibrationCache(size_t& length) override
std::ifstream input(mCacheFile, std::ios::binary);
input >> std::noskipws;
if (input.good()) {
length = mCalibrationCache.size();
return length ? &mCalibrationCache[0] : nullptr;

std::string mCacheFile;
std::vector mCalibrationCache;

Hi Steve,

https://devblogs.nvidia.com/int8-inference-autonomous-vehicles-tensorrt/ was my starting point. But this article seems to be valid for Caffe only… For the case of TensorFlow you end up like this: https://devtalk.nvidia.com/default/topic/1029476/?comment=5243932. After fixing the simple issues like I described there I get an calibration cache of about 1.1kB size. Is this a valid size for a calibration cache?

Best regards,


^ This workflow works for me with TF

@SteveNV, although I am unsure if the calibration_cache is independent of the platform because nowhere does the code mention what I am optimizing for.

Dear dhingratul,
You can generate the calibration table on host and use the same on PX2.

I am now looking for a way to import the generated .engine(from the disk) file after this workflow using the C++ API. All the samples I have seen use GIEstream, but there aren’t any generic use cases. The deserializeCudaEngine function call requires pointing to the location holding the .engine file and memory size, which I am not clear on.