Inference Time Hugging Face DETR


I have been testing the inference speed of the Hugging Face DETR (DETR for Object Detection) algorithm on my Jetson Orin Nano (8GB, 15W), and the results have been surprising. Any help would be appreciated.

The model is run with the same configurations on all different devices, the most relevant configuration I have made is that I am using config.num_queries = 500.

The image I am running inference on has size 800x800 pixels.

On my iMac, with 3.3 GHz 6-Core Intel Core i5, no GPU, inference time is about 1 sec

On the Jetson Orin Nano it takes about 4 seconds. Surprisingly, there is no much difference between using .to(‘cuda’) or .to(‘cpu’).

Time is measured only before an after model(inputs).

I would have expected inference to run at least as fast on the Jetson Orin Nano as on my Desktop CPU, or at the very least to be accelerated by the usage of CUDA. I monitored the GPU usage and it is 0% when using .to(‘cpu’) and slightly higher when using .to(‘cuda’).

Just for reference I am running the inference on the docker container: ‘dustynv/transformers:git-r35.3.1’.


If the GPU utilization is low, the bottleneck might come from data access.

It sounds like you are using PyTorch.
Could you loop the real inference call to see if the GPU utilization increases?


1 Like

I implemented the suggestion, inference time went down to 0.25 sec and GPU utilization went up.

Thanks. I appreciate the help.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.