Inverse Depth Projection - Tutorial or code sample

Hi,

with the standard Pinhole depth camera, projection from 3D to 2D can be obtained easily:

    view_proj_matrix = # From SD Helper
    homo = np.pad(points, ((0, 0), (0, 1)), constant_values=1.0)
    tf_points = np.dot(homo, view_proj_matrix)
    tf_points = tf_points / (tf_points[..., -1:])
    tf_points[..., :2] = 0.5 * (tf_points[..., :2] + 1)
    return tf_points[..., :3]

However, the inverse of that is not as straight forward. In conventional terms we usually have P_2d = K.[R|t] P_3d and easily reverse that to inv(K[R|t]) P_2d = P_3d. In Isaac sim, the convention seems to be different, in that there is some division. I am not sure what tf_points = tf_points / (tf_points[..., -1:]) is supposed to be.

I get P_(3d,cam) by the standard method:

        x = (u - u0) * depth[v, u] / fx
        y = (v - v0) * depth[v, u] / fy
        z = depth[v, u]
        cam_points[i] = (x, y, z)

But then doing P_(3d,cam).T_(view to world) actually produces wrong results, likely due to something else that I am missing in getting P_(3d,cam). I think Isaac swaps axes somewere, maybe distorts the image? Could you point me to a tutorial or show me how I can reverse the 2D to 3D points.

Here’s an example. The inverse is wrong. The main question is, how do I fix it?

points = np.asarray([[0,0,0]])
points = np.asarray([[100,0,0]])
view_proj_matrix = cam_info['view_projection_matrix']
homo = np.pad(points, ((0, 0), (0, 1)), constant_values=1.0)
print ("Homo pt",homo)
tf_points = np.dot(homo, view_proj_matrix)
print("Homo.VP",tf_points)
tf_points = tf_points / (tf_points[..., -1:])
print("Homo.VP/K",tf_points)
tf_points[..., :2] = 0.5 * (tf_points[..., :2] + 1)
print("Homo.VP/K - image frame",tf_points)
H,W = rdepth2.shape[:2]
U = (tf_points[...,0] * W).astype(np.int)
V = (tf_points[...,1] * H).astype(np.int)
plt.imshow(rdepth2)
plt.title("Projection 2D to 3D")
plt.scatter(U,V)
plt.show()
print("U,V,D",U[0],V[0], rdepth2[V[0], U[0]])

# UVD in image frame. 
# UV are normalized, D is depth at UV
uv = np.asarray([U[0],V[0]]) / np.asarray([W,H])
d = rdepth2[V[0], U[0]]
print ("UV",uv)
# UVD in camera frame (range of -1 to 1)
uv_ = uv*2 - 1
print ("UVcam",uv_)
# UVD -> XYZ 
d = -d

xyz = np.array([uv_[0], uv_[1], 0.9965842, 1]) * d
print("depth-scaled",xyz)
xyz = np.dot(xyz, np.linalg.inv(view_proj_matrix))
print("DS. inv(VP)", xyz)
print(d * xyz/np.linalg.norm(xyz))

points = xyz[None,:3] 
print("XYZ used", points)
view_proj_matrix = cam_info['view_projection_matrix']
homo = np.pad(points, ((0, 0), (0, 1)), constant_values=1.0)
tf_points = np.dot(homo, view_proj_matrix)
tf_points = tf_points / (tf_points[..., -1:])
tf_points[..., :2] = 0.5 * (tf_points[..., :2] + 1)
H,W = rdepth2.shape[:2]
U = (tf_points[...,0] * W).astype(np.int)
V = (tf_points[...,1] * H).astype(np.int)
plt.imshow(rdepth2)
plt.title("Inverse Projection 2D to 3D")
plt.scatter(U,V)
plt.show()

and also, if we do it the standard way, also doesn’t work:

So I think I figured it out. I just reversed the equation used by IsaacSim. My issues were stemming from the fact that Isaac uses inverse depth by default (even though it is still called depth!!), non-standard H. transforms (e.g, with translation on the bottom instead of side) and the camera model used is with frustum (with near/far planes), rather than the generic focal point one that I assumed.

# UVD in image frame. 
# UV are normalized, D is depth at UV
uv = np.asarray([U[0],V[0]]) / np.asarray([W,H])
d = rdepth2[V[0], U[0]]
print ("UV",uv)
# UVD in camera frame (range of -1 to 1)
uv_ = uv*2 - 1
print ("UVcam",uv_)
# UVD -> XYZ 
N = cam_info['clipping_range'][0]
F = cam_info['clipping_range'][1]
W = -d
Z = ( (W*F/(F-N)) + N*F/(F-N) )/(W)
xyz = np.array([uv_[0], uv_[1], Z, 1]) * W
print("depth-scaled",xyz)
xyz = np.dot(xyz, np.linalg.inv(view_proj_matrix))

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.