"cudaExternamMemoryGetMappedBuffer" error running with multiple GPU

Hi
I tried to run my code on the server with 8 NVIDIA A5000 GPUs. (The code works perfectly okay on my local computer with only 1 GPU).

But it gives the following error:

Setting seed: 1
Not connected to PVD
+++ Using GPU PhysX
Physics Engine: PhysX
Physics Device: cuda:0
GPU Pipeline: enabled
/home/chenda/anaconda3/envs/openrobot/lib/python3.7/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  ../aten/src/ATen/native/TensorShape.cpp:2157.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on rgbImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on depthImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on segmentationImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on optical flow buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on rgbImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on depthImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on segmentationImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on optical flow buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on rgbImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on depthImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on segmentationImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on optical flow buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on rgbImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on depthImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on segmentationImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on optical flow buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on rgbImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on depthImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on segmentationImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on optical flow buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on rgbImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on depthImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on segmentationImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on optical flow buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on rgbImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on depthImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on segmentationImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on optical flow buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on rgbImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on depthImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on segmentationImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on optical flow buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on rgbImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on depthImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on segmentationImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on optical flow buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on rgbImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on depthImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on segmentationImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on optical flow buffer with error 101
[Error] [carb.gym.plugin] Gym cuda error: invalid device ordinal: ../../../source/plugins/carb/gym/impl/Gym/GymPhysXCuda.cu: 991
*** Can't create empty tensor
*** Can't create empty tensor
*** Can't create empty tensor
*** Can't create empty tensor
*** Can't create empty tensor
*** Can't create empty tensor
*** Can't create empty tensor
*** Can't create empty tensor
*** Can't create empty tensor
*** Can't create empty tensor
/home/chenda/anaconda3/envs/openrobot/lib/python3.7/site-packages/gym/spaces/box.py:74: UserWarning: WARN: Box bound precision lowered by casting to float32
  "Box bound precision lowered by casting to {}".format(self.dtype)
{'policy': <class 'stable_baselines3.common.policies.ActorCriticPolicy'>, 'policy_kwargs': {'net_arch': [], 'features_extractor_class': <class 'locotransformer.policy.locotransformer_extractor.LocoTransformerExtractor'>, 'features_extractor_kwargs': {'encoder_param': {'hidden_shapes': [256, 256], 'visual_dim': 256}, 'net_param': {'transformer_params': [[1, 256], [1, 256]], 'append_hidden_shapes': [256, 256]}, 'state_input_shape': (48,), 'visual_input_shape': (4, 64, 64)}}, 'env': <NormObsWithImg<ObservationDictionaryToArrayWrapper instance>>, 'learning_rate': 5e-05, 'gamma': 0.99, 'gae_lambda': 0.95, 'target_kl': 0.05, 'max_grad_norm': 1, 'n_steps': 64, 'n_epochs': 5, 'batch_size': 8192, 'clip_range': 0.2, 'vf_coef': 0.8, 'clip_range_vf': 0.2, 'ent_coef': 0.01, 'tensorboard_log': '/home/chenda/open_robot-main/open_robot-main/runs/legged_static_seed_1_2022-12-26_0845', 'create_eval_env': False, 'verbose': 2, 'seed': 1, 'device': 'cuda:0'}
Using cuda:0 device
/home/chenda/open_robot-main/open_robot-main/stable_baselines3/ppo/ppo.py:148: UserWarning: You have specified a mini-batch size of 8192, but because the `RolloutBuffer` is of size `n_steps * n_envs = 640`, after every 0 untruncated mini-batches, there will be a truncated mini-batch of size 640
We recommend using a `batch_size` that is a factor of `n_steps * n_envs`.
Info: (n_steps=64 and n_envs=10)
  f"You have specified a mini-batch size of {batch_size},"
Traceback (most recent call last):
  File "train_in_static_env_sb3.py", line 352, in <module>
    log_interval=1,
  File "/home/chenda/open_robot-main/open_robot-main/stable_baselines3/ppo/ppo.py", line 321, in learn
    reset_num_timesteps=reset_num_timesteps,
  File "/home/chenda/open_robot-main/open_robot-main/stable_baselines3/common/on_policy_algorithm.py", line 247, in learn
    tb_log_name
  File "/home/chenda/open_robot-main/open_robot-main/stable_baselines3/common/base_class.py", line 453, in _setup_learn
    self._last_obs = self.env.reset()  # pytype: disable=annotation-type-mismatch
  File "/home/chenda/anaconda3/envs/openrobot/lib/python3.7/site-packages/gym/core.py", line 319, in reset
    observation = self.env.reset(**kwargs)
  File "/home/chenda/open_robot-main/open_robot-main/locotransformer/env_wrapper/dict_to_array_wrapper.py", line 109, in reset
    observation = self._gym_env.reset()
  File "/home/chenda/open_robot-main/open_robot-main/legged_complex_env/env/legged_visual_input.py", line 208, in reset
    obs, _, _, _ = self.step(torch.zeros(self.num_envs, self.num_actions, device=self.device, requires_grad=False))
  File "/home/chenda/open_robot-main/open_robot-main/legged_complex_env/env/legged_visual_input.py", line 166, in step
    self.post_process_camera_tensor()
  File "/home/chenda/open_robot-main/open_robot-main/legged_complex_env/env/legged_visual_input.py", line 258, in post_process_camera_tensor
    new_images = torch.stack(self.cam_tensors)
TypeError: expected Tensor as element 0 in argument 0, but got NoneType

I tried with the suggestions in this forum (setting the CUDA_VISIBLE_DEVICES, and also setting device, such as “–sim_device=cuda:0 --rl_device=cuda:0 --graphics_device=0”), But it either gives the same error or gives untrackable segmentation error.

Any help will be really appreciated!

1 Like

I got the same problem, have you solve it ?

me too ,how can we solve it

Hello, any news on this error?
I followed this: Create camera sensor fail on buffer and added these lines to the beginning of my script:

mport os 

os.environ['MESA_VK_DEVICE_SELECT'] = '10de:24b0' # When these are set it works but I am getting a segmentation fault
os.environ["CUDA_VISIBLE_DEVICES"] = '0'

The buffer error that is mentioned on this thread is not there anymore.
But I now am not getting these errors but I am getting a Segmentation fault and the images that I am receiving are not just black.
I have been stuck with this in a while, is there anyone who solved this problem?

Thank you very much!
Best,
Irmak.

1 Like

Hi Irmak, have you solved this issue? Thanks.

use cuda 11.8 can solve it.