Engine creation causes Segfault with Jetson AGX Xavier

Description

Hello,

I’m having a new issue when trying to build an engine on a Jetson AGX Xavier. Precisely when i’m running the function that is supposed to create a context from an ONNX model :

TRTUniquePtr<nvinfer1::IBuilder> builder{nvinfer1::createInferBuilder(gLogger)}; //Create builder pointer
TRTUniquePtr<nvinfer1::INetworkDefinition> network{builder->createNetworkV2(explicitBatch)}; //Create Network pointer
TRTUniquePtr<nvonnxparser::IParser> parser{nvonnxparser::createParser(*network, gLogger)}; //Create parser for ONNX model
TRTUniquePtr<nvinfer1::IBuilderConfig> config{builder->createBuilderConfig()}; //Create config (depends on TensorRT version)

if (!parser->parseFromFile(model_path.c_str(), static_cast<int>(nvinfer1::ILogger::Severity::kINFO)))
{
    std::cerr << "ERROR: could not parse the model.\n";
    return;
}

// allow TensorRT to use up to 4GB of GPU memory for tactic selection.
config->setMaxWorkspaceSize(1ULL << 32);

// use FP16 mode if possible
if (builder->platformHasFastFp16())
{
    config->setFlag(nvinfer1::BuilderFlag::kFP16);
    std::cout << "Using FP16" << "\n";
}
engine.reset(builder->buildEngineWithConfig(*network, *config)); 
context.reset(engine->createExecutionContext()); 
 
TRTUniquePtr<nvinfer1::IHostMemory> engine_plan{engine->serialize()};
writeBuffer(engine_plan->data(), engine_plan->size(), enginePath);

The engine creation starts with some logs then display a warning and segfaults :

[W]onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
Using FP16
[W]Detected invalid timing cache, setup a local cache instead
Segmentation fault (core dumped)

The thing is I also have this warning on my desktop computer and it works fine, I also use to have this warning on the same Jetson Xavier and it worked fine but since then the board was reflashed to have PLLAON as clock source (for some CAN issues) and I wonder if it could be related. And either way, how to solve it ?

[EDIT] As explained below it turns out the segfault was not related to TensorRT or the new clock source but simply to a stack memory limit

Environment

TensorRT Version: 8.0.1.6
GPU Type: Jetson AGX Xavier
Nvidia Driver Version:
CUDA Version: 10.2.300
CUDNN Version: 8.2.1.32
Operating System + Version: Ubuntu 18.04

Steps To Reproduce

Hi,

This looks like a Jetson issue. Please refer to the below samples in case useful.

For any further assistance, we will move this post to to Jetson related forum.

Thanks!

1 Like

Dear @rapha.lorenzolouis,
Did you check if the model is working with trtexec?
Is it possible to upgrade to Jetpack 5.0.1 and test?

Hello, thanks for the samples and the answer. It turns out it was an issue of stack memory limit elsewhere that caused the segfault.
It works fine now

Dear @rapha.lorenzolouis,
ok. So did you change stack memory limit using ulimit to fix the issue? Could you share how the issue was fixed to help others in community?

Hello,

I can’t share the full code but I kept the default stack size (which is quite low on the Tegra Xavier AGX) and rewrite a bit of code to use a pointer with memory properly allocated using malloc for a custom structure which must have been too large (also the declaration of this structure came after the TensorRT engine creation but caused the segfault while the create was going on, which led me to think it was related to this)

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.