I’ve read the documentation and the sample code, and I’ve tried to create my new reinforcement learning tasks, but I don’t understand the principles of the whole framework yet.
I’ve taken some tensors and used them to build actions and rewards, but so far I haven’t been able to run them successfully because there are too many errors about tensors and dimensions, similar to the following:
RuntimeError: mat1 dim 1 must match mat2 dim 0
def quat_mul(a, b):
assert a.shape == b.shape
~~~~~~~~~~~~~~~~~~~~~~~~~ <— HERE
shape = a.shape
a = a.reshape(-1, 4)
RuntimeError: The size of tensor a (9) must match the size of tensor b (2) at non-singleton dimension 1
RuntimeError: shape mismatch: value tensor of shape  cannot be broadcast to indexing result of shape [9, 3]
I don’t even know where to start troubleshooting errors, I’m still using the PPO algorithm in the example, and I haven’t changed the network structure.
Are there any points or tips that I should be aware of, and can I get some advice on how to build new reinforcement learning tasks?
Thank you very much.