Pose estimation: autoencoder vs pose_cnn_decoder

When using pose_cnn_decorder, my decoder_output is always all white, even after a few thousand steps. Adding noise to the image via training_config does not change this.

Using autoencoder, my decoder_output is fine after ten steps (without needing to add noise).

Any ideas why this might be happening?

Can you confirm that the ground truth color input into encoder and ground truth decoder output looks correct?
Also, are you using your custom object with pose_cnn_encoder training scene or the default one?
Few things that you can try is by reducing the learning rate and changing the bootstrap ratio in training_config.