Failed to create DLA engine from .etlt model

Hi,
I downloaded tlt-converter the DLA enabled version in order to generate an engine to utilize DLAs in the AGX.
I used maskrcnn that i retrained with COCO dataset using TLT v3.0.
The engine generation process failed and resulted in segmentation fault. I attached the log for the error
log.txt (61.1 KB)

Also, according to this blog here , Using the DLAs along with the GPU should give a performance boost, but when i attempted the DLA engine creation, alot of layers were not DLA enabled and fell back to the GPU and as far as i understand, this should actually hinder the GPU. So is there a certain way the benchmark in the blog was done?

Any help would be appreciated,

• Hardware: AGX Xavier
• Network Type: Mask_rcnn
• TLT Version: docker v3.0-py3

Can you share the command when you generate the trt engine?

More, did you ever try to generate trt engine without DLA? Is it successful?

i used this command

tlt-converter -k nvidia_tlt -d 3,832,1344 -o generate_detections,mask_fcn_logits/BiasAdd  maskrcnn_v3.etlt -e maskrcnn_DLA.engine -u 0

I usually let deepstream generate the GPU trt engine from the etlt, but i tried generating the trt engine without DLA using tlt-converter just now and the resulting engine has really low fps in deepstream compared to the one generated by deepstream.

Not sure why though.

i used this command for the trt engine without the DLA

tlt-converter -k nvidia_tlt -d 3,832,1344 -o generate_detections,mask_fcn_logits/BiasAdd model.step-700000.etlt -e maskrcnn_GPU.engine

For fps, I suggest you test the mask-rcnn model which is mentioned in the blog. It is trained on 1 class.

The configuration file and label file for the model are provided in the SDK. These files can be used with the generated model as well as your own trained model. A sample Mask R-CNN model trained on a one-class dataset is provided in GitHub https://github.com/NVIDIA-AI-IOT/deepstream_tlt_apps/

cd deepstream_tlt_apps/
wget https://nvidia.box.com/shared/static/8k0zpe9gq837wsr0acoy4oh3fdf476gq.zip -O models.zip
unzip models.zip
rm models.zip

Then use the same etlt model to generate trt engine.

Tried converting the model you mentioned to a trt engine with the DLA flag and i still got the same segmentation fault.

Any suggestions?

You mentioned that there is segmentation fault when generate trt engine with the DLA flag.
How about generating trt engine without the DLA flag ? Is it successful?

Yes, Generating the trt engine without the DLA flag is successful. I only have a problem generating it with the DLA flag

Can you download below model and retry? Please share the command and log.

From deepstream_tao_apps/download_models.sh at master · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub
# For peopleSegNet V2:
wget --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/tlt_peoplesegnet/versions/deployable_v2.0/zip

More, from the error log

[ERROR] Try increasing the workspace size with IBuilderConfig::setMaxWorkspaceSize() if using IBuilder::buildEngineWithConfig, or IBuilder::setMaxWorkspaceSize() if using IBuilder::buildCudaEngine.

Could you please set a larger workspace when you generate trt engine with DLA flag?
-w 1000000000

I cannot reproduce the error. See below command or log. I ran on NX board.
$ wget --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/tlt_peoplesegnet/versions/deployable_v2.0/zip

nvidia@nvidia:~/morganh/mrcnn$ ./tlt-converter -k nvidia_tlt -d 3,576,960 -o generate_detections,mask_fcn_logits/BiasAdd -t int8 -c peoplesegnet_resnet50_int8.txt -m 1 -w 100000000 -u 0 peoplesegnet_resnet50.etlt
20210819_mrcnn_trt_engine_with_dla.txt (57.8 KB)