Hi all,
I’m connecting to my university machine that we recently installed via RDP (via SSH: ssh -L 3389:127.0.0.1:3389 -C -N -l davide SERVER_IP). Once the ssh is established, I’m using remmina to login into the remote machine; I put my credentials into gmd and then I have a sudoer account available.
The machine has two NVIDIA RTX 3090 and we want to use them to train our ML model for an autonomous drone challenge.
I installed the latest NVIDIA drivers via .run file and blacklisted nouveau drivers.
The challenge script uses ROS and it launches a GUI to initialize the commands and a Unity standalone simulator (Flightmare); however Unity crashes if I try to run it using NVIDIA offload such as:
__NV_PRIME_RENDER_OFFLOAD_PROVIDER=NVIDIA-G0 __GLX_VENDOR_LIBRARY_NAME=nvidia roslaunch envsim visionenv_sim.launch render:=True
Flightmare log with X error:
process[master]: started with pid [10008]
ROS_MASTER_URI=http://localhost:11311
setting /run_id to ff203d02-ba8e-11ec-9b7a-47e000c76acd
process[rosout-1]: started with pid [10049]
started core service [/rosout]
process[kingfisher/dodgeros_pilot-2]: started with pid [10056]
process[kingfisher/viz_face-3]: started with pid [10057]
process[dodgeros_gui-4]: started with pid [10058]
process[flight_render-5]: started with pid [10059]
X Error of failed request: BadAlloc (insufficient resources for operation)
Major opcode of failed request: 150 (GLX)
Minor opcode of failed request: 5 (X_GLXMakeCurrent)
Serial number of failed request: 0
Current serial number in output stream: 94
[flight_render-5] process has died [pid 10059, exit code 1, cmd /home/davide/icra22_competition_ws/src/agile_flight/flightmare/flightrender/RPG_Flightmare.x86_64 __name:=flight_render __log:=/home/davide/.ros/log/ff203d02-ba8e-11ec-9b7a-47e000c76acd/flight_render-5.log].
log file: /home/davide/.ros/log/ff203d02-ba8e-11ec-9b7a-47e000c76acd/flight_render-5*.log
Running roslaunch without NVIDIA prefix works (with Unity rendering), but I don’t see any new process in nvidia-smi.
Plus, nvidia-settings only shows two GPU thermal settings along with ‘Graphic card Information’. Is it ok? Shouldn’t it show more settings entries?
Thank you for the support!
nvidia-bug-report.log (1.2 MB)