During repeated tests, the first, second, third "reward" are different from the others

Hi

The first 〜third “reward” are different from the others.
Is this a bug?

RunningMeanStd: (18,)
=> loading checkpoint ‘/home/sat001/wsp/isaac_3/IsaacGymEnvs-main/isaacgymenvs/runs/hor_1024_bs_65536_gpu_0/nn/hor_1024_bs_65536_gpu_0.pth’
reward: 252166624.0 steps: 510.0 ←➀
reward: 313811808.0 steps: 511.0 ←➁
reward: 313775776.0 steps: 511.0 ←➂
reward: 313775456.0 steps: 511.0
reward: 313775456.0 steps: 511.0
reward: 313775456.0 steps: 511.0
reward: 313775456.0 steps: 511.0
reward: 313775456.0 steps: 511.0
reward: 313775456.0 steps: 511.0
reward: 313775456.0 steps: 511.0
reward: 313775456.0 steps: 511.0
reward: 313775456.0 steps: 511.0
reward: 313775456.0 steps: 511.0
reward: 313775456.0 steps: 511.0
reward: 313775456.0 steps: 511.0
reward: 313775456.0 steps: 511.0
reward: 313775456.0 steps: 511.0
reward: 313775456.0 steps: 511.0
reward: 313775456.0 steps: 511.0
reward: 313775456.0 steps: 511.0
reward: 313775456.0 steps: 511.0
reward: 313775456.0 steps: 511.0
reward: 313775456.0 steps: 511.0
reward: 313775456.0 steps: 511.0

Hi, is this also related to your other post regarding reward exceeding the max value?

I want to ask that.

The length of the first episode is also different from the others.
I thought the operations of each test are completely independent, but from this situation it seems wrong.

“FrankaCabinet” is in the same state.

“seed” is fixed to ‘42’.
torch_deterministic: True
Command:
python train.py task=FrankaCabinet num_envs=1 test=True checkpoint=runs/FrankaCabinet/nn/FrankaCabinet.pth seed=42

PyTorch: 1.8.2

Please check.

RunningMeanStd: (23,)
=> loading checkpoint ‘/home/sa/wsp/isaac_3/IsaacGymEnvs-main/isaacgymenvs/runs/FrankaCabinet/nn/FrankaCabinet.pth’
reward: 1389.047607421875 steps: 498.0 ←499.0?
reward: 1384.9691162109375 steps: 499.0
reward: 1390.197998046875 steps: 499.0
reward: 1388.5855712890625 steps: 499.0
reward: 1384.7918701171875 steps: 499.0
reward: 1384.7052001953125 steps: 499.0
reward: 1380.9183349609375 steps: 499.0
reward: 1385.69287109375 steps: 499.0
reward: 1396.2381591796875 steps: 499.0
reward: 1382.2991943359375 steps: 499.0
reward: 1383.98681640625 steps: 499.0
reward: 1393.5643310546875 steps: 499.0
reward: 1387.8167724609375 steps: 499.0
reward: 1390.0850830078125 steps: 499.0
reward: 1395.5955810546875 steps: 499.0
reward: 1379.6478271484375 steps: 499.0
reward: 1393.7684326171875 steps: 499.0
reward: 1393.6485595703125 steps: 499.0
reward: 1390.3907470703125 steps: 499.0
reward: 1391.0640869140625 steps: 499.0
reward: 1381.163818359375 steps: 499.0
reward: 1384.6175537109375 steps: 499.0
reward: 1389.2071533203125 steps: 499.0
reward: 1383.3948974609375 steps: 499.0
reward: 1395.4481201171875 steps: 499.0
reward: 1381.5181884765625 steps: 499.0
reward: 1392.1209716796875 steps: 499.0
reward: 1376.385009765625 steps: 499.0
reward: 1376.314453125 steps: 499.0
reward: 1377.595703125 steps: 499.0
reward: 1379.409912109375 steps: 499.0
reward: 1390.7479248046875 steps: 499.0
reward: 1379.56982421875 steps: 499.0
reward: 1381.3255615234375 steps: 499.0
reward: 1381.1376953125 steps: 499.0
reward: 1376.1226806640625 steps: 499.0

Where are you tracking the steps count?

“steps count” is the value of “steps” above, isn’t it?
Is this wrong?
I don’t understand the intent of your question.
I’m not sure if this “steps count” is what you specify.

Yes sorry, I didn’t see the steps being printed when I ran a training, but I realized it was during inference. I believe this is coming from the rl_games side though. We don’t specifically control the number of steps per each iteration other than defining it in the configs.

I see.