Historically, NVIDIA has been secretive with regard to details of their GPUs’ microarchitecture. I see nothing that would incentivize them to be more transparent at this time.
In practical terms, it would be best to file a performance bug, as it is possible that the software simply has not been sufficiently optimized for the new architecture. Experience indicates that NVIDIA operates the compute business driven by customer demand. So the more bugs are filed for a particular performance issue, the more likely a fix will materialize.
NVIDIA’s business is selling hardware; providing lots of performance software is just a means to that end. If new expensive parts lack application level performance, it will be in NVIDIA’s best interest to address the underlying issues so hardware sales remain brisk.
For what it’s worth, at least one review has made similar observations:
At reference specifications, peak theoretical tensor throughput is around 107.6 TFLOPS for the RTX 2080 Ti, 80.5 TFLOPS for the RTX 2080, and 59.7 TFLOPS for the RTX 2070. Unlike the 89% efficiency with the Titan V’s 97.5 TFLOPS, the RTX cards are essentially at half that level, with around 47%, 48%, and 45% efficiency for the RTX 2080 Ti, 2080, and 2070 respectively. A Turing-optimized binary should bring that up, though it is possible that the GeForce RTX cards may not be designed for efficient tensor FP16 operations as opposed to the INT dot-product acceleration. After all, the GeForce RTX cards are for consumers and ostensibly intended for inferencing rather than training, which is the reasoning for the new INT support in Turing tensor cores.