It would be very useful for “pose” to be an Output data type - along w/ rgb, instance, semantic_seg, etc.
As question - is there a way to get “pose” out of replicator composer?
This would be used to feed an object pose algorithm (I’m currently using EfficientPose)
Hello @peter.gaston! I’ve reached out to the team about your questions. I will report back here when I have more information!
Hi @peter.gaston , you can use the
transform data output by the
bounding_box_3d annotator as the
thx! lots of hidden data lurking around, eh? Excellent!
I hope you are doing well. I am also interested in working with 6D pose estimation deep learning methods. Were you able to train any model with your dataset? All the work I see is based on the YCB video dataset only.
Not sure exactly what your question is - I’ll throw out some ideas - feel free to be more specific…
I have an ML model doing pose estimation using synthetic data (and real data). The synthetic data is composed primarily using replicator. Per the topic here, replicator does not expose the camera pose. However, if one sets the camera at, say 0,0,0.5 pointing 0,0,90 (or whatever) - then one can easily deduce the camera pose. i.e., don’t move the camera - move everything else. I created 65,000 images for my initial domain randomization training - and several thousand more so far to test in more reasonable conditions - see below.)
For my case, it’s not really 6D pose, given the exact environment (pallets in a warehouse) it’s really only X, Y and a yaw. The floor constrains the rest.
We’ve played with various models. We’ve used EfficientPose, a 2 stage mask-RCNN followed by either a direct to pose ML or a geometry based algorithm, or currently a key point based approach followed by a geometry algorithm. Your mileage will vary. We like the key point as it’s human explainable for failure modes - and seems easier to understand how to identify ways to further train the model to fix those failures.
So I would recommend using replicator to create a boat-ton of synthetic images to train on and work from there.
Thanks a lot for such a quick and detailed reply. I have few more questions please bear with me as I am new to this.
- For pose-related models is a camera pose required? (As you mentioned to keep the camera fixed)
- When I calculated the camera intrinsic matrix (using this guide) to convert the depth image to pointcloud, I noticed that it is not accurate. How are you calculating the camera intrinsic matrix ?
Is a camera pose required. Well, yes. You want the ground truth position of whatever you’re looking at in relation to the camera.
To get the camera matrix, I cheated. I used another method that works and outputs the camera pose (incl intrinsic matrix). section 2.6 on page https://docs.omniverse.nvidia.com/app_isaacsim/app_isaacsim/tutorial_replicator_recorder.html - except my code is:
import omni.replicator.core as rep
camera = rep.create.camera()
rotation=(0, 0, 180),