Single cudaMemcpy across multiple allocations

You can always file a bug if you wish (I would call it a request for enhancement, RFE, because I’m fairly certain the behavior you describe is, for that API call, “working as designed”).

I don’t believe the detailed behavior here is documented, but what I have observed is what I stated. cudaMemcpy certainly checks any pointers you pass that were already registered with the CUDA runtime, for validity (1. Is the pointer in the right space for the transfer kind 2. Does the pointer + transfer length define a region that is included in a single call to an allocation API such as cudaHostAlloc or cudaMalloc).

I’m fairly sure that sort of error checking is considered useful in a large variety of places and end-users, and I must say from my own personal experience, this process of looking for adjacency in independent allocations is not something I have come across. It strikes me as unusual.

So I doubt the CUDA designers would be ready to just drop the error checking that I believe is in place, and see what happens. However I’m sure there is a way to inspect all this, and follow some decision logic to see if every byte of a requested transfer can be found in some allocation request somewhere. You can imagine pathological situations, I think.

Another case is where the host side allocation is created with an API call (such as new, malloc, std::vector, etc.) that the runtime has no “ordinary” visibility into,. I don’t know what it does in this case, and in any event, as I stated already, I believe all of these details are undocumented.

In the current setting, the safe/expected thing is for each requested transfer region in its entirety to belong to a single allocation request that can be associated with that pointer.