How does a TRT builder decided how many cpu threads to use?
I have some code that builds a TRT engine from an ONNX model. I used to run it natively on CUDA 11.0 / TRT7/ ubuntu18.04 and it would build the engine using all cpu cores and max out the GPU util.
I am now running this same code in a docker with CUDA 11.3 / TRT8 / ubuntu18.04 and it works without error but only uses one cpu to build and therefore takes a very long time. I can’t seem to find anything in the builder API about cpu threads or anything about ENV variables that might set this. so I’m not even really sure where to look. There’s plenty of available RAM/VRAM
If it helps, the code is:
void onnxToTRTModel(const std::string &modelFile, // name of the onnx model
const std::string &filename, // name of saved engine
nvinfer1::ICudaEngine *&engine, const int &BATCH_SIZE) {
// create the builder
nvinfer1::IBuilder *builder =
nvinfer1::createInferBuilder(gLogger.getTRTLogger());
assert(builder != nullptr);
const auto explicitBatch =
1U << static_cast<uint32_t>(
nvinfer1::NetworkDefinitionCreationFlag::kEXPLICIT_BATCH);
auto network = builder->createNetworkV2(explicitBatch);
auto config = builder->createBuilderConfig();
auto parser = nvonnxparser::createParser(*network, gLogger.getTRTLogger());
if (!parser->parseFromFile(
modelFile.c_str(),
static_cast<int>(gLogger.getReportableSeverity()))) {
gLogError << "Failure while parsing ONNX file" << std::endl;
}
// Build the engine
builder->setMaxBatchSize(BATCH_SIZE);
config->setMaxWorkspaceSize(2300_MiB);
config->setFlag(nvinfer1::BuilderFlag::kFP16);
std::cout << "start building engine" << std::endl;
engine = builder->buildEngineWithConfig(*network, *config);
std::cout << "build engine done" << std::endl;