Can the GPU workload be spread between multiple GPUs in native Linux? Does NVLINK or the drivers or Linux can do the job automatically?

Hi! The specific hardware at the moment is 2x3090, no SLI, Ubuntu 22.04. I think I know how to split it manually with DL libraries like pytorch, assigning device 0, 1 etc. for a specific task.
However can this be set to be done automatically by the drivers or the OS (I understand that it would be to some extent) or if the GPUs are connected with NVLINK?

The question is in particular regarding Gazebo and ROS/ROS2 simulations for now.

Can the processing get spread to the two cards, even if they are under a small or average load, in order to load them evenly?

A sample usage in current simulations which do not push the GPU much, the second card is barely loaded at 405 MHz.

With four instances of the same instance:

The top card 133W, second 22 W or so, 405 MHz.

(It seems that Gazebo and that use case is not utilizing well the GPUs, at least in our simulations. We set the physics engine to Bullet which is supposed to use OpenCL?, and it made the simulation reach real time factor = 1, but still in our particular use-case the bottleneck seems to be the CPU. The GPU spread-load question still stands though.)

P.S. I found about a possibility of virtual GPUs, but it seems it’s for VMs.

Thanks for any suggestions!