Impact of cudaMalloc() on CPU LLC

  • you seem to have posted the same code snippet twice
  • I’m not intimately familiar with MSR monitoring of caches, but if the LLC misses fall to zero does that mean it is hitting in LLC?
  • cudaMalloc is an opaque library call whose detailed behavior is not documented anywhere that I know of. I would assume that any library call, in any library, could cause the LLC contents to be modified, depending on what it is doing. If for some reason cudaMalloc needs to load a lot of data, it might happen. I imagine there could be other reasons.
  • I’m not suggesting that I know that cudaMalloc does this, just that it seems “theoretically possible”.
  • General advice for use of cudaMalloc is to get it out of performance sensitive areas of code. For example, if you have a work processing loop in your code, it’s not advisable to perform a cudaMalloc at each iteration, Instead, seek to do your cudaMalloc operations prior to entering the loop, perhaps by allocating everything that is needed up front, and/or reusing allocations.
  • If cudaMalloc is doing this, and you don’t like that behavior, you’re welcome to file a bug requesting a change in behavior. Be advised that you’re likely to be asked for a complete code that demonstrates the issue, displaying measurements, etc.