Identify meeting-room tables and annotate images

martin269 · January 20, 2025, 8:20am

Hi,

I’m not sure if this is the correct fora to post in. Please point me in the right direction if it´s not.
I just discovered that Nvidia has a lot of powerful AI-tooling, but I could need some guidance to find the right ones to use.

I’m building a meeting room configurator. I want the user journey to be as follows:

A user uploads an image of a floor plan for the meeting room.
The user uploads a PDF that contains the installation instructions for a specific speaker. E.g. Bose.
I want the system I’m building to identify the meeting room table in the image and annotate where the ceiling speakers should be positioned, according to the information found in the PDF.

Which NVIDIA tooling can I leverage to achieve this goal?

Fiona.Chen · January 20, 2025, 8:53am

Do you mean you want a system which can understand the speaker installation instruction PDF and gives the instruction of where the speaker should be positioned in a meeting room?

Which part do you want to use Nvidia SDK or project? For understanding the PDF or for handling the meeting room floor plan picture?

martin269 · January 20, 2025, 9:40am

Hi,

Yes, I want the system to place speaker icons on the floor plan, above what it identifies as the meeting room table. If it can do so, then we have a guide for the guy who will install the room and we´ll design the room with the correct amount of speakers.

Here is an example floor plan of a meeting room.

martin269 · January 20, 2025, 9:46am

This is what’s typically found in the installation instructions:

martin269 · January 21, 2025, 1:24pm

llama3.2-vision correctly identified where I want my speakers and how many I must have.
My next step is determining which model I can use to position speaker icons to the floor plan I initially provided the LLM.

Any suggestions to which one I can leverage?

Fiona.Chen · January 23, 2025, 5:42am

Hi, @martin269

For the PDF document understanding, you may try Multimodal PDF Data Extraction Blueprint by NVIDIA | NVIDIA NIM.

martin269 · January 30, 2025, 12:50pm

Thanks for sharing the Data Extraction Blueprint, @Fiona.Chen! :-)
Do you have any suggestions to how I can apply speaker icons to the floor plan?
They should be evenly distributed in the area covering the table.

In this first version, I would be happy if I got a prompt simular to this to work:

Draw a speaker icon above each chair.

or

Draw a speaker icon for every square meter in the area covering the meeting room table.

Do you have any suggestions to which models that can handle this kind of instruction?

Fiona.Chen · February 6, 2025, 2:38am

We don’t have any recommendation for the UI design. There may be lots of open source projects.

martin269 · February 6, 2025, 7:23am

Good morning,

You wrote:

We don’t have any recommendation for the UI design. There may be lots of open source projects.

Sorry, I see that my question was a bit unclear.
It’s not about the UI design, but identifying which LLM I can use to annotate the images.
I’ve tried this prompt using ChatGPT 4o mini:
Here is a floor plan. Can you calculate the size of the meeting room table? I need 1 speaker for every square meter in the area covering the table. Can you please annotate where the speakers should be positioned, in the floor plan image I gave you?

It’s capable of doing the annotating, but does not understand precisely where to position the speakers.
Here is the image it returned:

Fiona.Chen · February 6, 2025, 7:34am

There are lots of VLM models in Try NVIDIA NIM APIs, you may have a try.

system · March 5, 2025, 5:42am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Boost Meeting Productivity with AI-Powered Note-Taking and Summarization Technical Blog	0	372	November 29, 2023
Practical Strategies for Optimizing LLM Inference Sizing and Performance Technical Blog	1	47	August 21, 2024
Join us at the NVIDIA Computer Vision Speaker Series Intelligent Video Analytics	2	439	September 25, 2023
Build an LLM-Powered API Agent for Task Execution Technical Blog	1	378	February 21, 2024
[SUPPORT] Workbench Example Project: Llama 3 Finetune NVIDIA AI Workbench llama	5	105	January 21, 2025
Build Multimodal Visual AI Agents Powered by NVIDIA NIM Technical Blog nim	1	35	October 31, 2024
Building a Simple VLM-Based Multimodal Information Retrieval System with NVIDIA NIM Technical Blog nim	2	20	February 26, 2025
Join us at the NVIDIA Computer Vision Speaker Series Announcements	2	405	September 25, 2023
Seamlessly Deploying a Swarm of LoRA Adapters with NVIDIA NIM Technical Blog	1	124	June 7, 2024
Join us at the NVIDIA Computer Vision Speaker Series Computer Vision & Image Processing	3	700	December 19, 2023

Identify meeting-room tables and annotate images

Related topics