Hi all,
We are working on TX2 Deeplearning performance improvement. We now profile our code using nvvp. And we notice that the most time-consuming function is fermiPlusSgemmLDS64_batch, its memory efficiency is low. However, what is fermiPlusSgemmLDS64_batch? We didnt write this function.
BR,
Tiandong