[TensorRT Error] DLA validation failed

sylphiette.m · November 6, 2024, 7:51am

Hello,
I have a network layer that can be run on DLA (which has been validated with the function config->canRunOnDLA(layer)) and it also has been INT8 quantized, but when I declare it as running on DLA with the API config->setDeviceType( layer, DeviceType::kDLA), it still returns “DLA validation failed” error when building engine.
Here is my code for this part:

        config->setFlag(BuilderFlag::kGPU_FALLBACK);
        config->setDLACore(0);
        ILayer* layer = network->getLayer(10);
        if(config->canRunOnDLA(layer)) {
            config->setDeviceType(layer, DeviceType::kDLA);
        }

and the error:

ERROR: 4: [network.cpp::validate::2789] Error Code 4: Internal Error (DLA validation failed)
ERROR: 2: [builder.cpp::buildSerializedNetwork::751] Error Code 2: Internal Error (Assertion engine != nullptr failed. )

Additionally, this should not be a problem with DLA or TensorRT itself. I can successfully build an engine running on DLA using trtexec.

Here is my source code file:
YolotoTensorRT.txt (7.8 KB)

my onnx model is yolov5_trimmed_qat.

Looking forward to your responses!

carolyuu · November 6, 2024, 8:00am

Hi,
Here are some suggestions for the common issues:

1. Performance

Please run the below command before benchmarking deep learning use case:

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

2. Installation

Installation guide of deep learning frameworks on Jetson:

TensorFlow: Installing TensorFlow for Jetson Platform - NVIDIA Docs
PyTorch: Installing PyTorch for Jetson Platform - NVIDIA Docs
We also have containers that have frameworks preinstalled:
Data Science, Machine Learning, AI, HPC Containers | NVIDIA NGC

3. Tutorial

Startup deep learning tutorial:

Jetson-inference: Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson
TensorRT sample: Jetson/L4T/TRT Customized Example - eLinux.org

4. Report issue

If these suggestions don’t help and you want to report an issue to us, please attach the model, command/step, and the customized app (if any) with us to reproduce locally.

Thanks!

AastaLLL · November 6, 2024, 9:20am

Hi,

Could you enable the verbose log and share the output with us?
Thanks.

sylphiette.m · November 6, 2024, 12:35pm

How to output verbose if the source code does not build the engine successfully?

The attachment is the verbose log output using trtexec instead of my source code:
trtexec_yolov5.log (315.8 KB)

But I want to have some network layers running on the DLA, not the whole model, so I can’t use trtexec, but need to use the TensorRT API to write the corresponding code. But currently running into the problem I described above.

AastaLLL · November 7, 2024, 6:50am

Hi,

You can enable it via setting the logger to VERBOSE level.

https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1_i_logger.html

class Logger : public nvinfer1::ILogger
{
    void log(Severity severity, const char* msg) noexcept override
    {
        if (severity <= Severity:: kVERBOSE) std::cout << msg << std::endl;
    }
} logger;

...
nvinfer1::IBuilder* builder = nvinfer1::createInferBuilder(logger);
nvinfer1::IBuilderConfig* config = builder->createBuilderConfig();
...

Thanks.

sylphiette.m · November 8, 2024, 2:47am

Thanks,

Following attached is the full output:
can_run_on_DLA.log (58.5 KB)

The log returns following error:
Network built for DLA requires kENTROPY_CALIBRATION_2 calibrator.

But my calibrator uses the type it says, here is my calibrator construct:

class MyCalibrator : public IInt8EntropyCalibrator2
{
private:
    int         nCalibration {0};
    int         nElement {0};
    size_t      bufferSize {0};
    int         nBatch {0};
    int         iBatch {0};
    float      *pData {nullptr};
    void      *bufferD {nullptr};
    cnpy::NpyArray array;
    Dims32      dim;
    std::string cacheFile {""};

public:
    MyCalibrator(const std::string &calibrationDataFile, const int nCalibration, const Dims32 inputShape, const std::string &cacheFile);
    ~MyCalibrator() noexcept;
    int32_t     getBatchSize() const noexcept;
    bool        getBatch(void *bindings[], char const *names[], int32_t nbBindings) noexcept;
    void const *readCalibrationCache(std::size_t &length) noexcept;
    void        writeCalibrationCache(void const *ptr, std::size_t length) noexcept;
};

AastaLLL · November 11, 2024, 7:09am

Hi,

Do you want to apply calibration with DLA output?
Suppose you can do this with GPU and use the same cache for DLA inference.

Could you help to turn off the calibration to see if it can work?

Thanks.

sylphiette.m · November 22, 2024, 8:26am

I finally found out that this is because after I changed my header file to inherit Calibrator for DLA, since the header file is not in the target file’s dependencies, the target file won’t compile the header file changes again, and the target file will still use Calibrator which is not applicable to DLA.
My solution: add the header file into makefile (or My solution is to add the header file to the makefile (or cmake) file and recompile the engine to use Calibrator for DLA.

system · December 17, 2024, 11:52am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
[TensorRT Error] DLA validation failed Jetson AGX Orin tensorrt	3	42	November 6, 2024
Trtexec failed to generate engine (Internal Error) with DLA Jetson Orin NX tensorrt , nvbugs , dla	7	1004	April 8, 2024
TensorRT run DLA on Xavier Jetson AGX Xavier nvbugs	11	1625	October 18, 2021
Convert model to TensorRT with DLA \| DLA Node compilation Failed TensorRT	3	916	October 12, 2021
DLA enabled engine not getting created in NVIDIA Orin NX using C++ API TensorRT jetson-inference , cudnn , dla	1	506	December 1, 2023
Error while using DLA Jetson AGX Orin tensorrt , dla	9	862	February 9, 2023
DLA bindOutputTensor failed when inferring. TensorRT 7.1.3 TensorRT tensorrt , jetson-inference , dla	7	564	April 26, 2021
Unable to build tensorrt engine with DLA enabled on Jetson Xavier NX Jetson Xavier NX tensorrt , cudnn	7	304	May 15, 2024
TRT engine successful built on JetPack 5.0.1(trt 8.4.1) but not on JetPack 5.1.2(TensorRT 8.5.2) Jetson Xavier NX tensorrt , dla	13	892	September 25, 2023
Failure in conversion for DLA TensorRT	3	479	July 21, 2021

[TensorRT Error] DLA validation failed

1. Performance

2. Installation

3. Tutorial

4. Report issue

Related topics