iGPU and dGPU role in Drivepx2.


  1. How to use both discrete and integrated gpu’s effectively, during run time in Tegra?
    For example, I have an image application where a part of pre-processing and actual processing can happen parallely.
    Whether it is possible to run the application separately on iGPU and DGPU respectively?

According to my understanding at a time only one context will given to a device.So if we want to utilize second device then we need to create another context with new device.
If yes whether switching between the context will degrade the performance?.

Also incase if the application wants to transfer data from iGPU to DGPU whether it is feasible because I don’t see such API’s or it should be like iGPU->Host and then Host->DGPU?

  1. Another question is on the same note whether it is possible to use both Tegra A and Tegra B parallely at a given time.
    If yes Kindly provide some link or samples path if any available in Driveworks SDK.

Kindly clarify on the above points and correct me if my understanding is wrong.

Dear navaznazar,
You can check if you can create 2 threads, one thread holds context on iGPU and another holds context on dGPU. Then you can use both GPUs at a time.
There are no APIs to transfer data directly from iGPU to dGPU. However, you can use EGLStream to transfer frames from iGPU to dGPU without additional memcpy to host memory. Please check if EGLStream can be used in your use case.
The Tegra A and Tegra B are like seperate system connected via ethernet. You can run different appplications on Tegra A and Tegra B at a time.

Dear SivaRamaKrishna,

I am wondering if Tegra A and Tegra B can be used at the same time for one single Neural Network. I mean, we want to utilize the capability of all the iGPUs and dGPUs on the device, ideally, four GPUs can be used to running inference for our NN model. Is this possible? If so, would you please point us to the right documentations that illustrate how this is achieved?

We understand that NCCL2.0 would support multiple machines and multiple GPUs, and the software we’ve found is only for amd64 but not for ARM architecture. Is there a version of NCCL specifically for PX2 so that we can make full use of all the GPUs for inference tasks on board? If not, is this feature on your roadmap in future?

Thanks in advance and looking forward to your reply.


Dear jinlingge,
NCCL is not supported on DrivePX2 platform. If you want to use 4 GPUs for a single network, The network needs to be distributed across 4 GPUs which results in more data transfers across PCIe. Instead, you can choose one network per GPU and run multiple networks in parallel. Also, We provide TensorRT library to optimize network models on DrivePX2. You can check that also.
Please let us know if you have any use case to use NCCL for inference.