Hi
I tried to run my code on the server with 8 NVIDIA A5000 GPUs. (The code works perfectly okay on my local computer with only 1 GPU).
But it gives the following error:
Setting seed: 1
Not connected to PVD
+++ Using GPU PhysX
Physics Engine: PhysX
Physics Device: cuda:0
GPU Pipeline: enabled
/home/chenda/anaconda3/envs/openrobot/lib/python3.7/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2157.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on rgbImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on depthImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on segmentationImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on optical flow buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on rgbImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on depthImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on segmentationImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on optical flow buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on rgbImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on depthImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on segmentationImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on optical flow buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on rgbImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on depthImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on segmentationImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on optical flow buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on rgbImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on depthImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on segmentationImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on optical flow buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on rgbImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on depthImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on segmentationImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on optical flow buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on rgbImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on depthImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on segmentationImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on optical flow buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on rgbImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on depthImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on segmentationImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on optical flow buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on rgbImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on depthImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on segmentationImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on optical flow buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on rgbImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on depthImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on segmentationImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on optical flow buffer with error 101
[Error] [carb.gym.plugin] Gym cuda error: invalid device ordinal: ../../../source/plugins/carb/gym/impl/Gym/GymPhysXCuda.cu: 991
*** Can't create empty tensor
*** Can't create empty tensor
*** Can't create empty tensor
*** Can't create empty tensor
*** Can't create empty tensor
*** Can't create empty tensor
*** Can't create empty tensor
*** Can't create empty tensor
*** Can't create empty tensor
*** Can't create empty tensor
/home/chenda/anaconda3/envs/openrobot/lib/python3.7/site-packages/gym/spaces/box.py:74: UserWarning: WARN: Box bound precision lowered by casting to float32
"Box bound precision lowered by casting to {}".format(self.dtype)
{'policy': <class 'stable_baselines3.common.policies.ActorCriticPolicy'>, 'policy_kwargs': {'net_arch': [], 'features_extractor_class': <class 'locotransformer.policy.locotransformer_extractor.LocoTransformerExtractor'>, 'features_extractor_kwargs': {'encoder_param': {'hidden_shapes': [256, 256], 'visual_dim': 256}, 'net_param': {'transformer_params': [[1, 256], [1, 256]], 'append_hidden_shapes': [256, 256]}, 'state_input_shape': (48,), 'visual_input_shape': (4, 64, 64)}}, 'env': <NormObsWithImg<ObservationDictionaryToArrayWrapper instance>>, 'learning_rate': 5e-05, 'gamma': 0.99, 'gae_lambda': 0.95, 'target_kl': 0.05, 'max_grad_norm': 1, 'n_steps': 64, 'n_epochs': 5, 'batch_size': 8192, 'clip_range': 0.2, 'vf_coef': 0.8, 'clip_range_vf': 0.2, 'ent_coef': 0.01, 'tensorboard_log': '/home/chenda/open_robot-main/open_robot-main/runs/legged_static_seed_1_2022-12-26_0845', 'create_eval_env': False, 'verbose': 2, 'seed': 1, 'device': 'cuda:0'}
Using cuda:0 device
/home/chenda/open_robot-main/open_robot-main/stable_baselines3/ppo/ppo.py:148: UserWarning: You have specified a mini-batch size of 8192, but because the `RolloutBuffer` is of size `n_steps * n_envs = 640`, after every 0 untruncated mini-batches, there will be a truncated mini-batch of size 640
We recommend using a `batch_size` that is a factor of `n_steps * n_envs`.
Info: (n_steps=64 and n_envs=10)
f"You have specified a mini-batch size of {batch_size},"
Traceback (most recent call last):
File "train_in_static_env_sb3.py", line 352, in <module>
log_interval=1,
File "/home/chenda/open_robot-main/open_robot-main/stable_baselines3/ppo/ppo.py", line 321, in learn
reset_num_timesteps=reset_num_timesteps,
File "/home/chenda/open_robot-main/open_robot-main/stable_baselines3/common/on_policy_algorithm.py", line 247, in learn
tb_log_name
File "/home/chenda/open_robot-main/open_robot-main/stable_baselines3/common/base_class.py", line 453, in _setup_learn
self._last_obs = self.env.reset() # pytype: disable=annotation-type-mismatch
File "/home/chenda/anaconda3/envs/openrobot/lib/python3.7/site-packages/gym/core.py", line 319, in reset
observation = self.env.reset(**kwargs)
File "/home/chenda/open_robot-main/open_robot-main/locotransformer/env_wrapper/dict_to_array_wrapper.py", line 109, in reset
observation = self._gym_env.reset()
File "/home/chenda/open_robot-main/open_robot-main/legged_complex_env/env/legged_visual_input.py", line 208, in reset
obs, _, _, _ = self.step(torch.zeros(self.num_envs, self.num_actions, device=self.device, requires_grad=False))
File "/home/chenda/open_robot-main/open_robot-main/legged_complex_env/env/legged_visual_input.py", line 166, in step
self.post_process_camera_tensor()
File "/home/chenda/open_robot-main/open_robot-main/legged_complex_env/env/legged_visual_input.py", line 258, in post_process_camera_tensor
new_images = torch.stack(self.cam_tensors)
TypeError: expected Tensor as element 0 in argument 0, but got NoneType
I tried with the suggestions in this forum (setting the CUDA_VISIBLE_DEVICES, and also setting device, such as “–sim_device=cuda:0 --rl_device=cuda:0 --graphics_device=0”), But it either gives the same error or gives untrackable segmentation error.
Any help will be really appreciated!