RL training doesn't work when observation buffer is created on GPU

I’m using SKRL to train an RL agent in Isaac Sim.

My Task inherits from RLTaskInterface (from omni.isaac.gym.tasks.rl_task import RLTaskInterface), and this is my cleanup method, called on __init__:

    def cleanup(self) -> None:
        self.obs_buf = torch.zeros((self._num_envs, self._num_observations), dtype=torch.float)
        self.states_buf = torch.zeros((self._num_envs, self._num_states), dtype=torch.float)
        self.rew_buf = torch.zeros(self._num_envs, device=self._device, dtype=torch.float)
        self.reset_buf = torch.ones(self._num_envs, device=self._device, dtype=torch.long)
        self.progress_buf = torch.zeros(self._num_envs, device=self._device, dtype=torch.long)
        self.extras = {}

I found that when I create self.obs_buf in CPU (without device=self._device), the training works just fine, but when I create it in GPU, the training is executed, but this is the result:
2023-12-26 21-25-49

Here are my codes, with the task script, skrl wrapper script, training script and so on:
cartpole_skrl.zip (3.4 MB)

So, why can’t the Observation Buffer (obs_buf) be created on the GPU, in this case? I checked the RLTask of OmniIsaacGymEnvs, and seemingly everything is created on the GPU. Does this have any negative impact on training performance?

PS.: I know that the better way to train an agent with multiple envs is to use Omni Isaac Gym Envs or Isaac Orbit to create my Environment, but first I would like to build it with only Isaac Sim.