Would it be possible to create entire environments based on GPT prompts within Replicator? The idea is to fine-tune computer vision models on synthetic datasets (models + environment)
I would like to create an AI agent operating from visual feedback.
Can we use other python libraries within Omniverse (such as YOLO or SegmentAnything) and is it possible to feed camera coordinates to a GPT prompt and then update the position based on the reply from GPT, similar to the 3D object placing pipeline?
You could certainly do that, you can even have GPT create USD directly rather than json files, the problem I started to run into were with token limits. Your GPT response can only be so long and if you’re creating meshes and textures from scratch, that takes up a lot of tokens very quickly.
A different approach I might take would be to compose your scene, keeping in mind those things you would like to randomize, and then ask GPT to give you random values for only those key items. That way you would be able to work with a much larger scene. That said, this is what replicator already does and I might use replicator for this specific task over GPT.
That sounds like such a cool project! Can’t wait to see what you put together! You can incorporate pretty much any python package you want into your extension, the Omniverse platform is extremely flexible. You could definitely tap into one of the scene’s update callabacks, get camera coordinates and feed those into GPT, just be ready for a really, really slow frame rate because it takes a few seconds to get that reply back from GPT :D