We’re using the Xavier for running inference of images. The network is built with the pytorch framework, and the inference part with pytorch for c++.
Does anyone know if there is any performance gains in creating the tensor on the gpu instead of the cpu? Afaik on regular desktop computers, improvements can be seen by doing this, but since on the Xavier the cpu and gpu share memory maybe this is not worth pursuing.
Hi @kce, while it is true that the Jetson’s CPU/GPU share the same physical memory, you need to allocate the memory with cudaHostAlloc() and the cudaHostAllocMapped flag in order for the memory to be accessible from both the CPU and GPU address spaces.
For example, if you do just an ordinary malloc() call, that memory will only be accessible from the CPU. I don’t believe PyTorch supports the mapped memory allocation mentioned above, but it does support GPU memory (i.e. cudaMalloc()).
Further, in order for PyTorch to utilize the GPU for processing the network during inferencing, the tensor would need to be on the GPU. So yes, the PyTorch tensor should be created on the GPU. Running the inferencing with CPU-only would be much slower than using GPU.
I see, so it might not be as automated as i assumed in pytorch, i will then try to first create the tensor on the GPU.
Also to clarify, we do use the GPU for inference, its just that we create the tensor on the CPU and move it to the GPU since we use opencv to load the image to be classified, and the version initially supplied from jetpack is not built for CUDA as far as i know.
I will try to buid from source with CUDA enabled, and load images directly to the gpu.