Pardon me for not knowing the exact difference as I’m new to this domain, but I was wondering how a Xavier SoC is different from, let’s say, a discrete Turing GPU? I haven’t been able to find good explanations on this topic.
My main reason for asking is this: I can run a really complex object detection model on a PC with Titan RTX at about 20-25FPS, whereas the same model when run on the DRIVE AGX Xavier results in a meager 10FPS.
I wanted to do some study on why this slowdown occurs and what I can do to make it run faster on the DRIVE module, but I am not sure where to start. What are the contributing factors in this speed difference?
Please let me know if I need to move this discussion elsewhere. Thanks!
DRIVE AGX Xavier has 2 Xavier SoCs. Each Xavier SoC has a CPU and iGPU which share same physical DRAM. Please check https://docs.nvidia.com/cuda/cuda-for-tegra-appnote/index.html#memory-management for more details on memory layout in Xavier SoC.
You need to check the computational power of your discrete w.r.t Xavier iGPU. You can check running CUDA deviceQuery sample to know the GPU specs. Please check FP32/FP16/INT8 Tops on your titan dGPU. Each Xavier SoC’s iGPU has 1.3 FP32 TFlops and 20 INT8 TOPS(Check Tech specs in https://developer.nvidia.com/drive/drive-agx).
You can check optimizing your model furthur using NVIDIA TensorRT(https://developer.nvidia.com/tensorrt) and perform inference on Xavier