Hi,
I’m having some interesting issue. I have an onnx network that I want to load and parse in tensorrt. Before I used a onnx2trt utility but now I directly parse it.
Some of the code I can’t put here since it’s from work, I tried to modify it to hopefully reflect my issue but if you need more info please ask:
bool profileOnnx(const std::string& network_path, std::ostream& gie_model_stream) {
auto builder = tensorUniquePtr<nvinfer1::IBuilder>(nvinfer1::createInferBuilder(gLogger));
if (!builder) {
return false;
}
auto network = tensorUniquePtr<nvinfer1::INetworkDefinition>(builder->createNetworkV2(0U));
if (!network) {
return false;
}
auto config = tensorUniquePtr<nvinfer1::IBuilderConfig>(builder->createBuilderConfig());
if (!config) {
return false;
}
config->setMinTimingIterations(3);
config->setMaxWorkspaceSize(16 << 20);
config->setAvgTimingIterations(2);
auto parser =
tensorUniquePtr<nvonnxparser::IParser>(nvonnxparser::createParser(*network, gLogger));
if (!parser) {
return false;
}
std::stringstream concatenated_path_stream;
concatenated_path_stream << INSTALL_PREFIX << network_path;
std::string onnx_path = concatenated_path_stream.str();
int verbosity = (int)nvinfer1::ILogger::Severity::kERROR;
if (!parser->parseFromFile(onnx_path.c_str(), verbosity)) {
return false;
}
builder->setMaxBatchSize(params.batch_size);
if (strToDeviceType(params.inference_device) == deviceType::DEVICE_DLA) {
// Enabling DLA
config->setDefaultDeviceType(nvinfer1::DeviceType::kDLA);
// config->setDLACore(sys.DLACore);
// Allow GPU fallback
config->setFlag(nvinfer1::BuilderFlag::kGPU_FALLBACK);
} else {
config->setDefaultDeviceType(nvinfer1::DeviceType::kGPU);
}
auto engine = std::shared_ptr<nvinfer1::ICudaEngine>(
builder->buildEngineWithConfig(*network, *config), tensorDeleter());
if (!engine) {
return false;
}
nvinfer1::Dims dimerinos = network->getOutput(0)->getDimensions();
printf("DIMERINOS %d %d %d %d",dimerinos.nbDims,dimerinos.d[0],dimerinos.d[1],dimerinos.d[2]);
nvinfer1::IHostMemory* net_mem = engine->serialize();
gie_model_stream.write((const char*)net_mem->data(), net_mem->size());
net_mem->destroy();
return true;
}
The issue is that getDimensions()
and getBindingDimensions()
are having different results on both host and device. I know my network’s input and output size, in host (meaning my laptop) everything works ok and the dimentions are not wrong, but in the Xavier (I cross compile for Xavier, but all versions of installed packages like tensorrt, cuda, cudnn are the same) the output dimensions of the network are wrong, but the input dimensions are ok.
So for example on host my input dimensions are 3x480x720 and my output dimensions are 2x480x720 (batchsize of 1).
On Xavier I get input dimensions as 3x480x720 (ok) and output dimensions are 2x1x1 (which is wrong, but the C channel seems to be ok). This puzzles me a lot, and of course raised an error in memory allocation and accessing, since so little memory was allocated on device.
I’ll write a short reproducible example with a random network and try to update this, but wanted to see if you have any insight. I tried multiple things, and in other versions (TensorRT5) this issue didn’t appear.
Environment
TensorRT Version : 6.0.1-1+cuda10.0
GPU Type : GEForce GTX 1060 AND Nvidia Jetson AGX Xavier
Nvidia Driver Version : 440.64
CUDA Version : 10.0
CUDNN Version : 7.6.5.32-1+cuda10.0
Operating System + Version : Ubuntu 18.04
Baremetal or Container (if container which image + tag) : Baremetal