Only one cpu core used to build TRT engine - why?

How does a TRT builder decided how many cpu threads to use?

I have some code that builds a TRT engine from an ONNX model. I used to run it natively on CUDA 11.0 / TRT7/ ubuntu18.04 and it would build the engine using all cpu cores and max out the GPU util.

I am now running this same code in a docker with CUDA 11.3 / TRT8 / ubuntu18.04 and it works without error but only uses one cpu to build and therefore takes a very long time. I can’t seem to find anything in the builder API about cpu threads or anything about ENV variables that might set this. so I’m not even really sure where to look. There’s plenty of available RAM/VRAM

If it helps, the code is:

void onnxToTRTModel(const std::string &modelFile, // name of the onnx model      
                      const std::string &filename,  // name of saved engine        
                      nvinfer1::ICudaEngine *&engine, const int &BATCH_SIZE) {     
    // create the builder                                                          
    nvinfer1::IBuilder *builder =                                                  
        nvinfer1::createInferBuilder(gLogger.getTRTLogger());                      
    assert(builder != nullptr);                                                    
                                                                                   
    const auto explicitBatch =                                                     
        1U << static_cast<uint32_t>(                                               
            nvinfer1::NetworkDefinitionCreationFlag::kEXPLICIT_BATCH);             
    auto network = builder->createNetworkV2(explicitBatch);                        
    auto config = builder->createBuilderConfig();                                  
                                                                                   
    auto parser = nvonnxparser::createParser(*network, gLogger.getTRTLogger());    
    if (!parser->parseFromFile(                                                    
            modelFile.c_str(),                                                     
            static_cast<int>(gLogger.getReportableSeverity()))) {                  
      gLogError << "Failure while parsing ONNX file" << std::endl;                 
    }                                                                              
    // Build the engine                                                            
    builder->setMaxBatchSize(BATCH_SIZE);                                          
    config->setMaxWorkspaceSize(2300_MiB);                                         
    config->setFlag(nvinfer1::BuilderFlag::kFP16);                                 
                                                                                   
    std::cout << "start building engine" << std::endl;                             
    engine = builder->buildEngineWithConfig(*network, *config);                    
    std::cout << "build engine done" << std::endl;

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:
https://docs.nvidia.com/deeplearning/tensorrt/quick-start-guide/index.html#onnx-export

  1. validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!