Is there a way to extract the features of the last layers of a network?

I trained an object detector using a YOLOv4 on a Resnet18 to recognize people, cars and bikes. I’m now trying to train an SVM externally to TLT in order to classify posses of the people in the pictures yolo found. The feature extraction done by my yolo would come in handy to preprocess the images for the SVM, is there anyway to extract those features?

TLT(TAO) does not provide an API for feature extraction. Your existing yolo_v4 tlt model can be set as the primary engine, and then connect to further(2nd, 3rd…) engine to classify poses.
BTW, TLT(TAO) provides bodyposenet. Body Pose Estimation — TAO Toolkit 3.22.05 documentation

Yeah I tried the deployable version of BodyPose but I don’t know how to run inference on Tensor RT using the converted .engine model. Is there a sample on how to do this? I do not wish to run inference in DeepStream. Thank you!

For deploying the bodypose etlt model, please refer to the pipeline mentioned in TLT CV Inference Pipeline Quick Start Scripts — Transfer Learning Toolkit 3.0 documentation

Hi,
Bodypose2d outputs key points of humans, it does not output a ‘label’ (for example dance, walk, hug, …).

Is my understanding correct for what you want to do?
train detection model:
image -> backbone -> yolov4
get features of trained model:
image -> backbone -> 'encoded_detections' (final output of yolv4 backbone) -> features
train a classifier for bodypose:
features + your labels of bodypose (for example 'run/hug/walk', not labels of keypoints of humans) -> SVM or DNN

If my understand is correct, then do you consider this?
step 1:
train a detection model
step 2:
train a bodypose classifier model with TAO/TLT
step 3
run with deepstream for the pipeline
image → yolov4(as primary gie) → human crop of images → classifier(second gie pipeline) → labels.

And we have a quick dev solution to extract features of yolov4, look like below. Please check whether you still wants the function.

cmd:

tao yolo_v4 inference -i images -o output -e spec_resnet18_21.08.txt -m yolov4_resnet18_epoch_080.tlt -k nvidia_tlt -l labels

Features and labels:

yolov4_tlt_feature_output# ls labels/

000005.features.txt 000005.txt

Features (loadable by numpy):

yolov4_tlt_feature_output# head labels/000005.features.txt

0.000000000000000000e+00,0.000000000000000000e+00,1.579947918653488159e-01,9.209936112165451050e-02,8.333333581686019897e-02,2.564102597534656525e-02,5.402590632438659668e-01,4.933373630046844482e-01,7.932619452476501465e-01,7.614458799362182617e-01,-8.097229957580566406e+00,6.445009261369705200e-03,-4.830681979656219482e-01,-2.866200208663940430e-01

0.000000000000000000e+00,0.000000000000000000e+00,2.984114587306976318e-01,1.274519264698028564e-01,8.333333581686019897e-02,2.564102597534656525e-02,5.192358493804931641e-01,4.967140257358551025e-01,7.455043792724609375e-01,8.102795481681823730e-01,-1.004259204864501953e+01,-1.901154518127441406e-01,-2.252185791730880737e-01,-2.791628539562225342e-01

That’s almost it, you got the gist of it, we just don’t want to run the pipeline in DeepStream. Instead we wish to run it in TensorRT. We wanted the features to train an SVM outside TLT, but found that just feeding a classifier images of people standing/sitting/kneeling was easier, given my understanding of the classifier in TAO is that it already does feature extraction. Is this right? Thank you for your assistance

Yes, as above mentioned step2, you can train a bodypose classifier model with TAO(TLT) classification network.

More comments here.

  • Classifier can do feature extraction, but the classifier in TAO does not output features, it just outputs labels. it is same to all TAO models.
  • So do you want features from yolov4 as input to your SVM or do you want features from classification network as input to your SVM?
  • Or do you want features from yolov4 and/or classification model as input to another classification model?
    yolov4(or classification model) → features of a layer → classification model–> loss/labels
  • You can just use standard classification model to do classification task, where input is image.

Anyway, what we proposed without deepstream:

  • image → TAO yolov4 → people crops → TAO classification model → people pose.