Identify meeting-room tables and annotate images

Hi,

I’m not sure if this is the correct fora to post in. Please point me in the right direction if it´s not.
I just discovered that Nvidia has a lot of powerful AI-tooling, but I could need some guidance to find the right ones to use.

I’m building a meeting room configurator. I want the user journey to be as follows:

  1. A user uploads an image of a floor plan for the meeting room.

  2. The user uploads a PDF that contains the installation instructions for a specific speaker. E.g. Bose.

  3. I want the system I’m building to identify the meeting room table in the image and annotate where the ceiling speakers should be positioned, according to the information found in the PDF.

Which NVIDIA tooling can I leverage to achieve this goal?

Do you mean you want a system which can understand the speaker installation instruction PDF and gives the instruction of where the speaker should be positioned in a meeting room?

Which part do you want to use Nvidia SDK or project? For understanding the PDF or for handling the meeting room floor plan picture?

Hi,

Yes, I want the system to place speaker icons on the floor plan, above what it identifies as the meeting room table. If it can do so, then we have a guide for the guy who will install the room and we´ll design the room with the correct amount of speakers.

Here is an example floor plan of a meeting room.

This is what’s typically found in the installation instructions:

llama3.2-vision correctly identified where I want my speakers and how many I must have.
My next step is determining which model I can use to position speaker icons to the floor plan I initially provided the LLM.

Any suggestions to which one I can leverage?

Hi, @martin269

For the PDF document understanding, you may try Multimodal PDF Data Extraction Blueprint by NVIDIA | NVIDIA NIM.

1 Like

Thanks for sharing the Data Extraction Blueprint, @Fiona.Chen! :-)
Do you have any suggestions to how I can apply speaker icons to the floor plan?
They should be evenly distributed in the area covering the table.

In this first version, I would be happy if I got a prompt simular to this to work:

Draw a speaker icon above each chair.

or

Draw a speaker icon for every square meter in the area covering the meeting room table.

Do you have any suggestions to which models that can handle this kind of instruction?

We don’t have any recommendation for the UI design. There may be lots of open source projects.

Good morning,

You wrote:

We don’t have any recommendation for the UI design. There may be lots of open source projects.

Sorry, I see that my question was a bit unclear.
It’s not about the UI design, but identifying which LLM I can use to annotate the images.
I’ve tried this prompt using ChatGPT 4o mini:
Here is a floor plan. Can you calculate the size of the meeting room table? I need 1 speaker for every square meter in the area covering the table. Can you please annotate where the speakers should be positioned, in the floor plan image I gave you?

It’s capable of doing the annotating, but does not understand precisely where to position the speakers.
Here is the image it returned:

There are lots of VLM models in Try NVIDIA NIM APIs, you may have a try.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.