I see that the GAN loss function used in your AMP algorithm example is not the same as the loss function mentioned in the AMP paper. You are using something like “-E[log(D(s,s’))]-E[log(1-D(s,s’))]” as the formula (5) mentioned in the AMP paper,
But they used formula (6) from LSGAN.
The reward function and gradient penalty function are also different from those mentioned in the AMP paper.
I’m wondering why you guys have a different approach than AMP? Or am I looking at something wrong?