According to the log, there is OOM when you train with two 1080 cards.
Suggest trying:
- reduce the bs: try train_batch_size: 1
- or train a smaller network. Note that please set width/height to multiples of 64.
For example, set image_size: “(640, 1024)” - if possible, try other cards, for example, v100