Hi.

I’ve read the documentation and the sample code, and I’ve tried to create my new reinforcement learning tasks, but I don’t understand the principles of the whole framework yet.

I’ve taken some tensors and used them to build actions and rewards, but so far I haven’t been able to run them successfully because there are too many errors about tensors and dimensions, similar to the following:

RuntimeError: mat1 dim 1 must match mat2 dim 0

@torch.jit.script

def quat_mul(a, b):

assert a.shape == b.shape

~~~~~~~~~~~~~~~~~~~~~~~~~ <— HERE

shape = a.shape

a = a.reshape(-1, 4)

RuntimeError: AssertionError:

RuntimeError: The size of tensor a (9) must match the size of tensor b (2) at non-singleton dimension 1

RuntimeError: shape mismatch: value tensor of shape [9] cannot be broadcast to indexing result of shape [9, 3]

I don’t even know where to start troubleshooting errors, I’m still using the PPO algorithm in the example, and I haven’t changed the network structure.

Are there any points or tips that I should be aware of, and can I get some advice on how to build new reinforcement learning tasks?

Thank you very much.