Here is the reply. Please note that I likely will not be able to respond to any follow-up questions about this. Therefore, if you ask a question, and I do not respond, that is the reason. I’m not permitted to share any more details than what is presented here:
We’ve confirmed that the performance drop is due to the size of the DAG exceeding the total on-chip TLB capacity on the Pascal GPU. As a result, there is an increased number of TLB misses, which affects performance. Because the TLB is a fixed capacity hardware resource, and the ETH algorithm accesses the DAG randomly, we don’t believe there are any software optimizations that could reduce the TLB miss rate.
In the Volta generation, TLB coverage was increased by 4x, and large DAG sizes (up to ~8GB, which won’t be reached for many years) will still fit in the on-chip TLB. So these newer GPUs (Volta and beyond) will show much less performance sensitivity due to DAG size.