Hi, i am running your transformer model code in tensorflow openseq2seq. The way the throughput is calculated in https://github.com/NVIDIA/OpenSeq2Seq/blob/master/open_seq2seq/utils/utils.py ; seems to be mis-leading. correct me if am wrong…
line 97 says the bench start is from 10th iteration but when you are calculating the avg objects per sec in line 226, 227, you are basically dividing total objects (iter 1 to end) by time taken only after 10th iter.
This makes the numerator too big and denominator too small for high batching.
Do you agree with this. I am getting ~10000 objects/sec for batch 256 which is unrealistic based on what i explained above.