I am having an hard time to extract the skeleton of body out of images using jetson inference

import jetson.inference
import jetson.utils

import numpy as np
import cv2

import argparse
import sys

# parse the command line
parser = argparse.ArgumentParser(description="Run pose estimation DNN on a video/image stream.", 
                                 formatter_class=argparse.RawTextHelpFormatter, epilog=jetson.inference.poseNet.Usage() +
                                 jetson.utils.videoSource.Usage() + jetson.utils.videoOutput.Usage() + jetson.utils.logUsage())

parser.add_argument("input_URI", type=str, default="", nargs='?', help="URI of the input stream")
parser.add_argument("output_URI", type=str, default="", nargs='?', help="URI of the output stream")
parser.add_argument("--network", type=str, default="resnet18-body", help="pre-trained model to load (see below for options)")
parser.add_argument("--overlay", type=str, default="links,keypoints", help="pose overlay flags (e.g. --overlay=links,keypoints)\nvalid combinations are:  'links', 'keypoints', 'boxes', 'none'")
parser.add_argument("--threshold", type=float, default=0.15, help="minimum detection threshold to use") 

try:
	opt = parser.parse_known_args()[0]
except:
	print("")
	parser.print_help()
	sys.exit(0)

# load the pose estimation model
net = jetson.inference.poseNet(opt.network, sys.argv, opt.threshold)

# create video sources & outputs
input = jetson.utils.videoSource(opt.input_URI, argv=sys.argv)
output = jetson.utils.videoOutput(opt.output_URI, argv=sys.argv)

img_size = (1920,1080)
skl = np.ones(img_size) 

# process frames until the user exits
while True:
    # capture the next image
    img = input.Capture()

    # perform pose estimation (with overlay)
    poses = net.Process(img, overlay=opt.overlay)
    print(opt.overlay)

    # print the pose results
    print("detected {:d} objects in image".format(len(poses)))

    for pose in poses:
        print(pose)
        print(pose.Keypoints)
        print('Links', pose.Links)

        # find the keypoint index from the list of detected keypoints
        # you can find these keypoint names in the model's JSON file, 
        # or with net.GetKeypointName() / net.GetNumKeypoints()
        nose_idx = pose.FindKeypoint('nose')
        left_eye_idx = pose.FindKeypoint('left_eye')
        right_eye_idx = pose.FindKeypoint('right_eye')
        left_ear_idx = pose.FindKeypoint('left_ear')
        right_ear_idx = pose.FindKeypoint('right_ear')
        left_shoulder_idx = pose.FindKeypoint('left_shoulder')
        right_shoulder_idx = pose.FindKeypoint('right_shoulder')
        left_elbow_idx = pose.FindKeypoint('left_elbow')
        right_elbow_idx = pose.FindKeypoint('right_elbow')
        left_wrist_idx = pose.FindKeypoint('left_wrist')
        right_wrist_idx = pose.FindKeypoint('right_wrist')
        left_hip_idx = pose.FindKeypoint('left_hip')
        right_hip_idx = pose.FindKeypoint('right_hip')
        left_knee_idx = pose.FindKeypoint('left_knee')
        right_knee_idx = pose.FindKeypoint('right_knee')
        left_ankle_idx = pose.FindKeypoint('left_ankle')
        right_ankle_idx = pose.FindKeypoint('right_ankle')
        neck_idx = pose.FindKeypoint('neck') 


        # if the keypoint index is < 0, it means it wasn't found in the image
        if nose_idx < 0 or left_eye_idx < 0 or right_eye_idx < 0 or left_ear_idx < 0 or right_ear_idx < 0 or left_shoulder_idx < 0 or right_shoulder_idx < 0 or left_elbow_idx < 0 or right_elbow_idx < 0 or left_wrist_idx < 0 or right_wrist_idx < 0 or left_hip_idx < 0 or right_hip_idx < 0 or left_knee_idx < 0 or right_knee_idx < 0 or left_ankle_idx < 0 or right_ankle_idx < 0 or neck_idx < 0:
            continue

        nose = pose.Keypoint[nose_idx]
        left_eye = pose.Keypoint(left_eye_idx)
        right_eye = pose.Keypoint(right_eye_idx)
        left_ear = pose.Keypoint(left_ear_idx)
        right_ear = pose.Keypoint(right_ear_idx)
        left_shoulder = pose.Keypoint(left_shoulder_idx)
        right_shoulder = pose.Keypoint(right_shoulder_idx)
        left_elbow = pose.Keypoint(left_elbow_idx)
        right_elbow = pose.Keypoint(right_elbow_idx)
        left_wrist = pose.Keypoint(left_wrist_idx)
        right_wrist = pose.Keypoint(right_wrist_idx)
        left_hip = pose.Keypoint(left_hip_idx)
        right_hip = pose.Keypoint(right_hip_idx)
        left_knee = pose.Keypoint(left_knee_idx)
        right_knee = pose.Keypoint(right_knee_idx)
        left_ankle = pose.Keypoint(left_ankle_idx)
        right_ankle = pose.Keypoint(right_ankle_idx)
        neck_idx = pose.Keypoint(neck_idx)
        
        # shoulder
        cv2.line(skl, (round(left_shoulder.x),  round(left_shoulder.y) ),(round(right_shoulder.x), round(right_shoulder.y)), (0,0,0),10)
        # hip
        cv2.line(skl, (round(left_hip.x),  round(left_hip.y) ),(round(right_hip.x), round(right_hip.y)), (0,0,0),10)

        # left upper body 
        cv2.line(skl, (round(left_shoulder.x),  round(left_shoulder.y) ),(round(left_hip.x), round(left_hip.y)), (0,0,0),10)
        cv2.line(skl, (round(left_shoulder.x),  round(left_shoulder.y) ),(round(left_elbow.x), round(left_elbow.y)), (0,0,0),10)
        cv2.line(skl, (round(left_elbow.x),  round(left_elbow.y) ),(round(left_wrist.x), round(left_wrist.y)), (0,0,0),10)
        # right uppder body
        cv2.line(skl, (round(right_shoulder.x),  round(right_shoulder.y) ),(round(right_hip.x), round(right_hip.y)), (0,0,0),10)
        cv2.line(skl, (round(right_shoulder.x),  round(right_shoulder.y) ),(round(right_elbow.x), round(right_elbow.y)), (0,0,0),10)
        cv2.line(skl, (round(right_elbow.x),  round(right_elbow.y) ),(round(right_wrist.x), round(right_wrist.y)), (0,0,0),10)

        # left leg
        cv2.line(skl, (round(left_hip.x),  round(left_hip.y) ),(round(left_knee.x), round(left_knee.y)), (0,0,0),10)
        cv2.line(skl, (round(left_knee.x),  round(left_knee.y) ),(round(left_ankle.x), round(left_ankle.y)), (0,0,0),10)
        # right leg
        cv2.line(skl, (round(right_hip.x),  round(right_hip.y) ),(round(right_knee.x), round(right_knee.y)), (0,0,0),10)
        cv2.line(skl, (round(right_knee.x),  round(right_knee.y) ),(round(right_ankle.x), round(right_ankle.y)), (0,0,0),10)

        # ḥead
        # nek 
        # eyes
        # nose
        # mouth

       
    cv2.imshow("foo",skl)
    cv2.waitKey(0)
    cv2.destroyAllWindows()

    # render the image
    output.Render(img)

    # update the title bar
    output.SetStatus("{:s} | Network {:.0f} FPS".format(opt.network, net.GetNetworkFPS()))

    # print out performance info
    net.PrintProfilerTimes()

    # exit on input/output EOS
    if not input.IsStreaming() or not output.IsStreaming():
        break

If I tried t draw simple line it would work fine. I am not sure why when I run it it just retunr flat white images with no drawing.

But honestly I think this is done wrongly. Because what happen if i don’t have a point? it can’t be drawn.

Hi @romeofilippo95, have you tried printing out the coordinates of the keypoints you are looking for before drawing them, to make sure they are valid?

Hi, yes it prints on the console and this is the solution I came up with. I check first if the tuple exist and then I draw a line using cv2. Now I am thinking to loop through the points in pose.Keypoints and if they are not in the pose.Links I am going to draw a cv2.circle

for pose in poses:
    x = 0
    y = 0
    x2 = 0
    y2 = 0
    print(pose)
    print(pose.Keypoints)
    print('Links', pose.Links)

    dictLinks = pose.Links
    dictPoint = pose.Keypoints

    for tup in pose.Links:
        print(tup)
        print(tup[0])
        print(tup[1])
        a = dictPoint[tup[0]]
        ase = dictPoint[tup[1]]
        #if tup == (4,6) or tup == (3,5) or tup == (1,2) or tup == (0,1) or tup == (0,2):
        #    cv2.circle(skl, (round(x),round(y)),radius=1,color=(0,0,0), thickness= 5)
        #    cv2.circle(skl, (round(x2),round(y2)),radius=1,color=(0,0,0), thickness= 5)
        #else: 
        x = getattr(a,"x")
        y = getattr(a,"y")
        x2 = getattr(ase,"x")
        y2 = getattr(ase,"y")
        cv2.line(skl, (round(x),  round(y)),(round(x2), round(y2)), (0,0,0),10)

OK great, glad you found a solution. BTW, if you just want the skeleton drawn, you can use poseNet.Overlay() function to draw the poses onto an image of your choosing (or like the original posenet.py sample does, you can use poseNet.Process() and specify the overlay flag to draw the skeleton on top of the input image)

Then you can get the CUDA image into OpenCV with jetson.utils.cudaToNumpy() function:

I know this might be off-topic, but I am having a hard time finding a solution for my project.

I am working on a task where I want to use the machine linked below to draw skeletons extracted from the code above on canvases of different sizes.

I am thinking to add a camera on top of the machine and making a 2d scan of the canvas, but there is a problem, the distance between the robot and the canvas isn’t far so it’s hard to use object recognition for the whole canvas unless I specifically train the model for close up. Or I do edge detection and do some sort of 2d scan but don’t how that would work.

Then after that, the machine finds the canvas would scale and resize the image skeleton and map inside the canvas and print the result. I did some research online but could not find any results. Any advice would be appreciated, thanks!

This is great, thanks for sharing these two tips

It sounds like you want to stitch together a panorama, which OpenCV has some functions to help with, but typically these use feature extraction + matching, and against a blank canvas I don’t think there would be many distinct features to do this with. In lieu of that, typically some checkboard pattern is used for calibration.

If you just mean an image recognition/classification model (as opposed to a more complex object detection or pose estimation model), then that could work - classification models are relatively easy to collect data for and train.

I can go simpler by doing an color detection, apply a real-time panorama and image stitching with OpenCV, define the points of the canvas and then produce the image, a bit more complicated by 3d mapping, extract the image, overlay the point and then render irl. There are some projects such as XSeries Robot Turret and 3D semantic mapping this might be over killing but does the job

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.