I am using the Coral devboard to do inference. When running multiple inferences, the first one is always slowly because the time to send the tensors to the embedded TPU is measured along with the classification/inference time.
I want to know if the same happens with the embedded GPUs?
Where can I find an example?
It depends on what kind of memory are you used.
For Jetson, the physical memory is shared by CPU and GPU.
As a result, you can use a unified or paged-locked memory without sending tensor to the processor.
A basic inference sample can be found in our TensorRT folder:
When you say
physical memory you mean the DRAM (off-chip), right? The one of 8GB.
But why does the first inference take longer than the other ones, then? I am only measuring the inference
It’s known that GPU takes longer at first launch for initialization:
As a result, we usually reserve some warm-up time before benchmarking.