3d Bounding box label format

Hello,

I am trying to train an 3D pose detector with the help of simulation data. While generating the 3D bounding box,I am not able to understand the format of the bbox values provided in the numpy file.

Please help understanding it.

I am attaching the reference image showing where the cardboard box is whose bbox is generated.

The output numpy array

[(0, -25.000013  , -25.  , 1.5258789e-05, 24.999992  , 25.  , 50.00001  , [[ 0.00709001,  0.00705207,  0.        ,  0.        ], [-0.00705207,  0.00709001,  0.        ,  0.        ], [ 0.        ,  0.        ,  0.01      ,  0.        ], [-1.1415757 ,  1.2989804 ,  0.19760007,  1.        ]])
 (0,  -0.25000003,  -0.25, 1.4901161e-07,  0.25000003,  0.25,  0.5000001, [[ 0.70900124,  0.7052072 ,  0.        ,  0.        ], [-0.7052072 ,  0.70900124,  0.        ,  0.        ], [ 0.        ,  0.        ,  1.        ,  0.        ], [-1.1415757 ,  1.2989804 ,  0.19760007,  1.        ]])]

Let me know if any additional information is needed. Thanks in advance.
Mayank

Hello @mayank.ukani! I am assuming that you are using Omniverse Replicator? I’ve reached out to the developers to give you some assistance!

Hi @mayank.ukani

We recently updated our docs page with detailed information about our annotators. Please take a look at the “Bounding Box 3D” section in this page:

Annotators Information — Omniverse Extensions documentation (nvidia.com)

Take a look specifically at the “dtype” info in the example, where it shows which numbers are the semanticId, the extents of the bbox and the transform. Hopefully this helps!

In that there is only information for x/y/z min and max. Can you please explain what does the values present inside the 4 list indicates :
[[ 0.00709001, 0.00705207, 0. , 0. ], [-0.00705207, 0.00709001, 0. , 0. ], [ 0. , 0. , 0.01 , 0. ], [-1.1415757 , 1.2989804 , 0.19760007, 1. ]]

Thanks

Hi @mayank.ukani thats the transform. There’s other online sources that can describe a 4x4 matrix much better than I can, but doing a quick search, this one might help:
Geometry (Row Major vs Column Major Vector) (scratchapixel.com)

In that page they show the parts of the transform and what does what.

[AXx, AXy, AXz, 0]
[AYx, AYy, AYz, 0] 
[AZx, AZy, AZz, 0]
[Tx,  Ty,  Tz, 1]

Important to note the transform here is Row-Major. Below you can see what the printed transform from the 3dbbox example, where the cone and sphere show their X positions in the bottom left.

#Cone
cone = rep.create.cone(semantics=[("prim", "cone")], position=(100, 0, 0))

[   1.,    0.,    0.,    0.]
[   0.,    1.,    0.,    0.]
[   0.,    0.,    1.,    0.]
[ 100.,    0.,    0.,    1.]

#Sphere
sphere = rep.create.sphere(semantics=[("prim", "sphere")], position=(-100, 0, 0))
[   1.,    0.,    0.,    0.]
[   0.,    1.,    0.,    0.]
[   0.,    0.,    1.,    0.]
[-100.,    0.,    0.,    1.]

Hope this helps!

1 Like

This information is really helpful. Thanks a lot for such a quick resolution!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.