When I use the TensorRT to infer MoblieNet in the INT8 mode,I meet the following errors.How can I solve the problems?

Description

[TensorRT] VERBOSE: Engine generation completed in 3.27319 seconds.
[TensorRT] VERBOSE: Calculating Maxima
[TensorRT] ERROR: …/rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[TensorRT] ERROR: …/rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)

When the calibrator is activated,these errors occur.
The following is my code of calibrator:

class MNISTEntropyCalibrator(trt.IInt8EntropyCalibrator):
   def __init__(self, cache_file, batch_size=1):
        trt.IInt8EntropyCalibrator.__init__(self)
        self.cache_file = cache_file
        # Every time get_batch is called, the next batch of size batch_size will be copied to the device and returned.
        self.data = load_data(data_list)
        self.batch_size = batch_size
        self.current_index = 0
        # Allocate enough memory for a whole batch.
        print(self.data[0].nbytes * self.batch_size)
        self.device_input = cuda.mem_alloc(self.data[0].nbytes * self.batch_size)
        # self.device_input = cuda.mem_alloc(2 << 30)
        print(self.device_input)

   def get_batch(self, names):
       if self.current_index + self.batch_size > self.data.shape[0]:
           return None

       current_batch = int(self.current_index / self.batch_size)
       if current_batch % 10 == 0:
           print("Calibrating batch {:}, containing {:} samples".format(current_batch, self.batch_size))

       batch = self.data[self.current_index:self.current_index + self.batch_size].ravel()
       cuda.memcpy_htod(self.device_input, batch)
       self.current_index += self.batch_size
       return [self.device_input]

   def get_batch_size(self):
       return self.batch_size

   def read_calibration_cache(self):
       # If there is a cache, use it instead of calibrating again. Otherwise, implicitly return None.
       if os.path.exists(self.cache_file):
       with open(self.cache_file, "rb") as f:
            return f.read()

Environment

TensorRT Version: 7.0.0.11
GPU Type: RTX 2080
Nvidia Driver Version:440.82
CUDA Version: 10.0
CUDNN Version: 7.6.4
Operating System + Version:Ubuntu16.04
Python Version (if applicable): 3.7.6
TensorFlow Version (if applicable): 1.14
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encounteredPreformatted text

Hi @1226291951,
There is a “workspace” parameter that limit the maximal memory amount for TensorRT.
This error indicates that the workspace is not enough for TensorRT to reach the optimal performance.
Can you please try increasing the WORKSPACE_SIZE.
Thanks!

Hello,I am sorry to bother you again.
I have setted the WORKSPACE_SIZE to the max value,but another error occurs,as following:
[TensorRT] VERBOSE: Calculating Maxima
[TensorRT] ERROR: …/rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 11 (invalid argument)
[TensorRT] ERROR: …/rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 11 (invalid argument)
The maximum of memory of the GPU RTX 2080 is 10989MB
Is the reason for this error because the WORKSPACE_SIZE isn’t big enough?
Can you help me to solve the problem?I will thank you very much!
The following is my complete code,I don’t know whether it is wrong:

class MNISTEntropyCalibrator(trt.IInt8EntropyCalibrator):
    def __init__(self, cache_file, batch_size=1):
        # Whenever you specify a custom constructor for a TensorRT class,
        # you MUST call the constructor of the parent explicitly.
        trt.IInt8EntropyCalibrator.__init__(self)

        self.cache_file = cache_file

        # Every time get_batch is called, the next batch of size batch_size will be copied to the device and returned.
        self.data = load_data(data_list)
        self.batch_size = batch_size
        self.current_index = 0

        # Allocate enough memory for a whole batch.
        print(self.data[0].nbytes * self.batch_size)
        self.device_input = cuda.mem_alloc(self.data[0].nbytes * self.batch_size)
        # self.device_input = cuda.mem_alloc(2 << 30)
        print(self.device_input)

    # TensorRT passes along the names of the engine bindings to the get_batch function.
    # You don't necessarily have to use them, but they can be useful to understand the order of
    # the inputs. The bindings list is expected to have the same ordering as 'names'.
    def get_batch(self, names):
        if self.current_index + self.batch_size > self.data.shape[0]:
            return None

        current_batch = int(self.current_index / self.batch_size)
        if current_batch % 10 == 0:
            print("Calibrating batch {:}, containing {:} images".format(current_batch, self.batch_size))

        batch = self.data[self.current_index:self.current_index + self.batch_size].ravel()
        cuda.memcpy_htod(self.device_input, batch)
        self.current_index += self.batch_size
        return [self.device_input]

    def get_batch_size(self):
        return self.batch_size

    def read_calibration_cache(self):
        # If there is a cache, use it instead of calibrating again. Otherwise, implicitly return None.
        if os.path.exists(self.cache_file):
            with open(self.cache_file, "rb") as f:
                return f.read()

    def write_calibration_cache(self, cache):
        with open(self.cache_file, "wb") as f:
            f.write(cache)


EXPLICIT_BATCH = 1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
# Building engine
with trt.Builder(TRT_LOGGER) as builder, builder.create_network(EXPLICIT_BATCH) as network, builder.create_builder_config() as config, \
        trt.OnnxParser(network, TRT_LOGGER) as parser:
    builder.max_batch_size = 1
    builder.max_workspace_size = 1 << 33
    builder.int8_mode = True
    calibration_cache = "./mnist_calibration.cache"
    calib = MNISTEntropyCalibrator(cache_file=calibration_cache, batch_size=1)
    config_flags = 1 << int(trt.BuilderFlag.INT8)
    config.flags = config_flags
    config.int8_calibrator = calib
    with open("/home/dm/ATP-Audio-classification-training-pipeline/voice_recognition/checkpoints/mobilenetV2-gvlad28/mobilenetV2.onnx", 'rb') as model:
        if not parser.parse(model.read()):
            for error in range(parser.num_errors):
                print(parser.get_error(error))
    last_layer = network.get_layer(network.num_layers - 1)
    if not last_layer.get_output(0):
        network.mark_output(last_layer.get_output(0))
    print("network layers", network.num_layers)
    inputs = [network.get_input(i) for i in range(network.num_inputs)]
    outputs = [network.get_output(i) for i in range(network.num_outputs)]
    for inp in inputs:
        print(inp.shape[0])
    for oup in outputs:
        print(oup.shape[0])
    profile_intput = builder.create_optimization_profile()
    profile_intput.set_shape("input", (1, 257, 200, 1), (1, 257, 200, 1), (1, 257, 200, 1))
    config.add_optimization_profile(profile_intput)
    config.max_workspace_size = 1 << 33
    engine = builder.build_engine(network, config)
    with open("/home/dm/ATP-Audio-classification-training-pipeline/voice_recognition/checkpoints/mobilenetV2-gvlad28/mobilenetV2_int8.trt", "wb") as f:
        f.write(engine.serialize())

Hi ,
Apologies for delayed response.

Can you please try builder->setMaxWorkspaceSize(2<<10);
Please take the reference from the below link
https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#build_engine_c

Thanks!