Low GPU Utilization with Sagemaker Neo-compiled Models

KonHein · December 20, 2021, 4:29pm

Hello everyone,

I am using AWS Greengrass to deploy a total of 5 models to a Jetson NX predicting specific attributes (age, gender, etc). The general idea is to extract faces and subsequently predict with four models on every crop.
All models are compiled using AWS Sagemaker Neo (four Pytorch models, one tflite model). I am using Python, loading images with openCV.
Inference is accomplished by first loading all models via the pb2 grpc agent and sending a predict request, following this example Build machine learning at the edge applications using Amazon SageMaker Edge Manager and AWS IoT Greengrass V2 | AWS Machine Learning Blog.

However, during runtime GPU utilization is at around 30%, having a lot of downtime in between.
I tried to overcome that by using multithreading. I tested with a threadpool using concurrent.futures by assigning a thread to every model and in another try assigning a thread to the inference workflow for each detected face. Both with no success.

tegra-stats during runtime:

Thank you for your help and time!

KonHein · December 20, 2021, 4:33pm

jtop output emphasizes GPU underutilization:

AastaLLL · December 21, 2021, 3:08am

Hi,

A common reason is the bandwidth bottleneck.
GPU job is relatively fast compared to the processing from OpenCV or some underlying data transfer.

Have you checked this with the sample owner first?
For Jetson, it’s more recommended to convert the PyTorch/TFLite model into TensorRT for inference.

Thanks.

KonHein · December 21, 2021, 8:55am

Hello,

Thank you for your quick response. I agree, bandwidth bottleneck seems most probable.

Can you specify what you mean by that?

Also would it be smart/good practice to pre-process images on the GPU for example by using Numba? Not sure if that would help with bandwidth though.

Thank you!

AastaLLL · December 29, 2021, 8:22am

Hi,

It seems that you are using an AWS API.
Maybe you can also contact them for some idea.

You can give Numba a try.
But on Jetson, it’s still recommended to use Deepstream which has been optmized for the Jeston.

Thanks.

system · January 26, 2022, 3:18am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.