Dear Community,
this is Max from Italy.
As part of my University research activities, I am working on a software for RF propagation simulations. The code developed by another research group in my University makes extensive use of GPU computing (TCC form) and CUDA code for parallelization.
For my activities, I have two workstations with nearly identical specs in terms of RAM and Xeon CPU. The only difference is one having a Tesla P100 and the other one having a Tesla K40 card installed. Generally speaking I would expect the P100 card performances in terms of computational speed to be better that the K40 one. According to my research activities, this seems to be true as long as the simulation environment is “demanding”, i.e: challenging urban scenario with many propagation rays to be calculated and managed. On the contrary, if the scenario is “simple”, i.e: few buildings and not so many rays, well then the P100 card is not as efficient as the K40, which proves to run the simulation faster. I am not that expert in CUDA programming and the Nvidia Tesla HW specs of these two cards, but can any of you provide a reasonable explanation of this behaviour ? How come a P100 card is slower than a K40 for small, simple scenarios ? Is there a kind of “overhead” to be aware of ?