Train model with CNN block

I’m working on a project to train an agent that takes a 4D tensor (RGB + depth sensor) as the observation. And I want to add the rl_games CNN block + to the MLP layers. But when I try to add the CNN configuration, I’m getting exceptions. I see in the vec_task file the osb_buff is a flat tensor of N-elements, so I tried to change the size of the osb_buffer to size(width, high, channels, n_envs) but nothing is working. Any ideas?

Here is the network block from my TaskPPO.yaml file:

  network:
    name: actor_critic
    separate: False
    space:
      continuous:
        mu_activation: None
        sigma_activation: None
        mu_init:
          name: default
        sigma_init:
          name: const_initializer
          val: 0
        fixed_sigma: True
    cnn:
      type: conv2d
      activation: relu
      initializer:
        name: default
      regularizer:
        name: 'None'
      convs:
        - filters: 32
          kernel_size: 8
          strides: 4
          padding: 0
        - filters: 64
          kernel_size: 4
          strides: 2
          padding: 0
        - filters: 128
          kernel_size: 3
          strides: 1
          padding: 0
    mlp:
      units: [256, 128, 64]
      activation: relu
      d2rl: False
      initializer:
        name: default
      regularizer:
        name: None

this is the change I tried on the vec_task.py

        if enable_camera_sensors:
            img_width = config["env"]["imgWidth"]
            img_height = config["env"]["imgHeight"]
            img_grayscale = config["env"]["grayScale"]
            img_chs = 2 if img_grayscale else 4  # GD or RGBD
            self.obs_space = spaces.Box(low=0.0, high=1.0, shape=(img_width, img_height, img_chs), dtype=np.float32)
        else:
            self.obs_space = spaces.Box(np.ones(self.num_obs) * -np.Inf, np.ones(self.num_obs) * np.Inf)

And this is the error I’m getting when I try to run the training (for n_envs=64):

RuntimeError: Expected 4-dimensional input for 4-dimensional weight [32, 4, 8, 8], but got 2-dimensional input of size [64, 16384] instead

I finally got my training to work! I was forgetting to change the size of the obs_buf:

# Env.__init__
self.obs_img = False
if enable_camera_sensors:
    self.obs_img = True
    self.img_width = config["env"]["imgWidth"]
    self.img_height = config["env"]["imgHeight"]
    self.img_chs = config["env"]["imgChannels"]
    self.obs_space = spaces.Box(low=0.0, high=1.0, shape=(self.img_width, self.img_height, self.img_chs), dtype=np.float32)
else:
    self.obs_space = spaces.Box(np.ones(self.num_obs) * -np.Inf, np.ones(self.num_obs) * np.Inf)

# VecTask.allocate_bueffers
obs_size = (self.num_envs, self.img_width, self.img_height, self.img_chs) \
    if self.obs_img else (self.num_envs, self.num_obs)
self.obs_buf = torch.zeros(obs_size, device=self.device, dtype=torch.float)