Hello,
I have been testing the inference speed of the Hugging Face DETR (DETR for Object Detection) algorithm on my Jetson Orin Nano (8GB, 15W), and the results have been surprising. Any help would be appreciated.
The model is run with the same configurations on all different devices, the most relevant configuration I have made is that I am using config.num_queries = 500.
The image I am running inference on has size 800x800 pixels.
On my iMac, with 3.3 GHz 6-Core Intel Core i5, no GPU, inference time is about 1 sec
On the Jetson Orin Nano it takes about 4 seconds. Surprisingly, there is no much difference between using .to(‘cuda’) or .to(‘cpu’).
Time is measured only before an after model(inputs).
I would have expected inference to run at least as fast on the Jetson Orin Nano as on my Desktop CPU, or at the very least to be accelerated by the usage of CUDA. I monitored the GPU usage and it is 0% when using .to(‘cpu’) and slightly higher when using .to(‘cuda’).
Just for reference I am running the inference on the docker container: ‘dustynv/transformers:git-r35.3.1’.