GPU Virtualization with QEMU/KVM on Jetson AGX Xavier Developer Kit

Hey, before I get into my questions I want to state that I’m an absolute beginner in this area.

I have a Jetson AGX Xavier with Jetson 5.x that I’m using for my Computer Vision tasks, so I’m mainly utilizing GPU level operations.

I was wondering if dividing/virtualizing an extra kernel with GPU passthrough would allows me to run separate applications within each kernel that utilizes the same physical GPU resources. If this is possible how? If not, are there any other concepts that would sort of satisfy this need of GPU virtualization?

The architecture I meantion is something like this:
Kernel<->Physical GPU
Kernel0<->GPU0
Kernel1<->GPU1
where GPU0 and GPU1 are totally isolated virtual GPUs that belong to the single physical GPU.

I came across this forum post but the instructions given there are not very clear for me to understand and implement. Also, after inspecting the post I believe the Jetson 5.x already supports KVM features but I’m not sure, if you could also clarify that for me it would be great.

Thanks a lot in advance for any advice or any instructions you provide.

Hi,

Would you mind sharing more about the target you want to achieve?

All the tasks are sent to the same physical GPU, they will wait on the same queue.
Since GPU has its own scheduler, the priority and resource allocation cannot be controlled.

Based on this, you can directly link all the kernels to the physical GPU.

Thanks.

Hey again @AastaLLL,

By using my Jetson AGX Xavier’s GPU I’m running numerous model-based Computer Vision tasks such as Object Tracking, Segmentation, Detection and so on.
Running these methods on their own yield acceptable results in terms of GPU load usage and FPS measurements, but I wonder if running them at the same time would decrease their individual performance drastically.
So, what I actually want is to come up with a way of making sure that these methods can work in parallel within the constraints of the GPU with no effect to their individual performance. Maybe one might think running them at the same time to see if the architecture fails is a one way of doing it but it’s not feasible for my case. So I thought sort of dividing the GPU into multiple virtual GPUs with predetermined specs controlled by me could be another (questionable) way of doing it, that’s why I asked this GPU virtualization question in the first place.

Thanks a lot.

Hi,

Please try it with the CUDA stream.
The tasks launched to different streams can run parallel while the tasks on the same stream will execute in order.

Thanks.

Hey @AastaLLL
I’m aware of the CUDA streams but as far as I know the architecture of CUDA streams themselves do not provide a direct mechanism for memory management or prevent out-of-memory (OOM) errors. So, unlike the method I described this would need maximum gpu load calculation for each task right?
The way I describe is something like this, but with limitation awareness.
For example, the pytorch model in the script1 should work like it has a different GPU with 30% of the total capabilities of our single GPU, whereas the pytorch model in script2 should work like it has a different GPU with 70% of the total capabilities. This way the OOM shouldn’t occur and either script1 or 2 will just simply continue to work slowly without throwing an OOM error.
If CUDA streams can handle this issue and provide an architecture where multiple pytorch scripts can run by making use of a single GPU without throwing an OOM error (they can work slower and faster according to the current load, it doesn’t matter) I can ofc implement it.

Thanks a lot.

Hi,

Unfortunately, the feature is not available on Jetson.
Please note that Jetson is a shared memory system.

The physical memory is shared by all the GPU tasks as well as the CPU.

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.