Build an Enterprise-Scale Multimodal Document Retrieval Pipeline with NVIDIA NIM Agent Blueprint

Originally published at: https://developer.nvidia.com/blog/build-an-enterprise-scale-multimodal-document-retrieval-pipeline-with-nvidia-nim-agent-blueprint/

Trillions of PDF files are generated every year, each file likely consisting of multiple pages filled with various content types, including text, images, charts, and tables. This goldmine of data can only be used as quickly as humans can read and understand it.  But with generative AI and retrieval-augmented generation (RAG), this untapped data can…

Ok, I am new AI systems, and development, so be gentle, as my questions may seem simple or wrongly focused. I am currently trying to develop a software, of which, a user uploads a building blueprint, and wall measurements are then given along with the total square footage. I am using HRNet for walls, CV for total square footage. I am currently having issues with multiplan blueprints, with detection, and getting consistent measurements. When I get it producing correct measurements on a 2-plan blueprint, it seems like it breaks, for instance on 3-plan blueprints. Seems like my issues stem from ROI boxes, but I am not truly sure. Would I benefit from nv-yolox mentioned above? Like I said, I am totally new to this, no formal education, and just trying to find my way. I would appreciate any help, advice thoughts, or input, as my searching of answers has brought me here. Sorry for the rambling and shifting erratically of thoughts, as I tried to keep this short. If anyone has the desire to send any information, or advice - turnertyler590@gmail.com