Best approach for text to 3D scene assembly?


I was trying out ChatGPT to generate scene assembly based on the NVIDIA demo app, but it would often get the height wrong. E.g. putting a TV on a TV stand if I understand the code, it would find a TV stand from the assets then add a TV, but the “add TV” step did not use the height of the TV stand - everything ended up at height zero.

Are there best practices or a better way to generate a scene description so the geometry of objects put into a scene are taken into account?

There are a few ways to achieve this, and include prompt engineering, result control and pre-object disposition checks. In the demo AI Room Generator we just take the XYZ position of objects and don’t provide specific instructions on how the Y axis should behave. You can be more strict in your prompt engineering about it, for example you could try adding specific behaviour requirements like “If you are placing a tabletop object or an object that usually is on a wall or on top of furniture, make sure the Y axis value is not 0”. Also you can provide a few examples of desired result, for example provide an example where a TV is placed on top of a TV stand. Once you receive the final results, you can also run your check to make sure that the desired objects are correctly placed with the right offsets ahead of placing them in the scene.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.