Building a network takes too long


Hi, I am trying to build a U-Net like the one here (GitHub - milesial/Pytorch-UNet: PyTorch implementation of the U-Net for image semantic segmentation with high quality images) by compiling it and saving the serialzed trt engine. However, the process is too slow. Takes 45min for 2048*2048 resolution. Is there anyway to speed up the network serialization?


TensorRT Version: TensorRT-
GPU Type: NVIDIA GeForce GTX 1660 Ti with Max-Q Design
Nvidia Driver Version:
CUDA Version: 11
CUDNN Version:
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

Hi @oelgendy1,

We request you to provide more details. Could you please let us know how are you building tensorrt engine.

Thank you.

Thanks @spolisetty for your reply. I am using the same method in sampleOnnxMNIST.cpp

I am using same build parameters as in the sample code. Then, I save the serialized engine using

    nvinfer1::IHostMemory* serializedModel = mEngine->serialize();
    std::ofstream ofs(engineName, std::ios::out | std::ios::binary);
    ofs.write((char*)(serializedModel->data()), serializedModel->size());

I also tried to use tensorrt.exe and it is very slow as well. It takes 45 min on my machine to build a U-Net engine for 2048x2048 image!

Does the workspace size relate to the network build time? And if I reduced it, will it get a sub-optimal serialized network?

Hi @oelgendy1,

Could you please share the ONNX model with us to try from our end. Meanwhile we request you to check the GPU utilization during serialization. Also please refer (Section - How do I choose the optimal workspace size),

Thank you.

1 Like

Thanks @spolisetty for your reply. I tried other GPU and the network build is faster (10min) It highly depends on the GPU

Hi @oelgendy1,

We also recommend you to refer Best Practices For TensorRT Performance :: NVIDIA Deep Learning TensorRT Documentation

Thank you.