I just measured the main memory bandwidth on a TX1 using the STREAM benchmark and got the following results:
Function MBytes/sec Min (sec) Max Average
Copy 18267.198 0.04592 0.04726 0.04643
Mul 18620.018 0.04505 0.04663 0.04563
Add 19140.081 0.06574 0.06829 0.06652
Triad 18935.432 0.06645 0.06797 0.06696
I have two questions:
The PRELIMINARY data sheet talks about the following memory configurations: 4ch x 16-bit LPDDR4 @ 1.6 GHz. Is this wrong? Is it 8 channels? Or is each channel 32 bits? Because the stated specs result in a peak bandwidth of 12.8 GB/s.
Looking at the PTX I get “st.global.f64 [%r14], %fd3;” instructions for storing. Do they incur a write-allocate? The measurements suggest that they don’t. Is the “do i need to do a write-allocate” handled by hardware, i.e. detect if whole CL is written and do no write-allocate in this case?