Hi,
I downloaded tlt-converter the DLA enabled version in order to generate an engine to utilize DLAs in the AGX.
I used maskrcnn that i retrained with COCO dataset using TLT v3.0.
The engine generation process failed and resulted in segmentation fault. I attached the log for the error log.txt (61.1 KB)
Also, according to this blog here , Using the DLAs along with the GPU should give a performance boost, but when i attempted the DLA engine creation, alot of layers were not DLA enabled and fell back to the GPU and as far as i understand, this should actually hinder the GPU. So is there a certain way the benchmark in the blog was done?
I usually let deepstream generate the GPU trt engine from the etlt, but i tried generating the trt engine without DLA using tlt-converter just now and the resulting engine has really low fps in deepstream compared to the one generated by deepstream.
Not sure why though.
i used this command for the trt engine without the DLA
For fps, I suggest you test the mask-rcnn model which is mentioned in the blog. It is trained on 1 class.
The configuration file and label file for the model are provided in the SDK. These files can be used with the generated model as well as your own trained model. A sample Mask R-CNN model trained on a one-class dataset is provided in GitHub https://github.com/NVIDIA-AI-IOT/deepstream_tlt_apps/
cd deepstream_tlt_apps/
wget https://nvidia.box.com/shared/static/8k0zpe9gq837wsr0acoy4oh3fdf476gq.zip -O models.zip
unzip models.zip
rm models.zip
Then use the same etlt model to generate trt engine.
You mentioned that there is segmentation fault when generate trt engine with the DLA flag.
How about generating trt engine without the DLA flag ? Is it successful?
[ERROR] Try increasing the workspace size with IBuilderConfig::setMaxWorkspaceSize() if using IBuilder::buildEngineWithConfig, or IBuilder::setMaxWorkspaceSize() if using IBuilder::buildCudaEngine.
Could you please set a larger workspace when you generate trt engine with DLA flag? -w 1000000000