I have a question about the picture below, which is using 3 GPU to compute an answer over time. All of the data resides on GPU 0 then is copied to GPU 1&2. Then computation then transfered back to GPU 0.
-Using Unified Virtual Addressing.
P2P/TCC was not used for the data movement - even more problems with timing and incorrect results occured when turning TCC on. (saving that to figure out later, unless there is a known bug?)
What would cause the variance that the picture shows such as:
–Memory copies taking longer randomly
–Pauses/gaps between actions
For time referance the pause in blue section is ~3 ms. Then Purple ~ 3.6 ms
I’m looking for suggestions on what to possibly look at to understand the behavior.
Or is this normal?
After much more testing it just seems that the runtime is not perfectly consistant, but it still seems strange to have random 3 ms pauses. Or to have memory transfers some times take 2-5x their normal time (this occured, not in image though)
Any thoughts? Or anyone who has also seen this?