cudaMemcpy fails mysteriously memory violation: non-existent physical address


I have a problem setting up a pretty basic code, and I’ve never seen this particular problem before. The code I have looks like this:

[codebox]void doCudaStuff(InputStruct * in)


InputStruct* in_d; // device memory

cutilSafeCall(cudaMalloc((void**) &in_d, sizeof(InputStruct)));

// I have error checks and prints that show that in_d does contain an address (typically 0x01000600)

// sizeof(InputStruct) == 32

cutilSafeCall(cudaMemcpy(in_d, in, sizeof(InputStruct), cudaMemcpyHostToDevice)); // fails


The error occurs on the line with cudaMemcpy. ‘fatal error in “foo::doCudaStuff”: memory access violation at address: 0x01000600: non-existent physical address.’

I’m running OSX Snow Leopard with CUDA 2.3. The SDK examples and a number of homebrew codes run fine, so I’m pretty mystified that something this simple fails. What am I missing?

Note that this doesn’t happen with -deviceemu turned on. Actually, now that I think about it I should check and see what happens in emulation mode. I’ll do that and check back here later!

Thanks all!


I’m pretty new myself so this probably won’t help but… I had a similar error right after I accidentally messed up the arch settings. I’m using Visual Studio under Windows so I don’t know what the analog would be on the Mac side, but what I’d done was set one of the compile/code architecture settings the wrong way round. I forget whether I set them both to virtual or both to hardware. I only figured it out by going through the settings for “EmuDebug” (which worked) and “Release” (which didn’t) until I spotted what I’d done. Figure below shows correct settings for my machine:

Hope this helps


Raffle, thanks for trying, but I only have a compute capability of 1.1 on my card and that’s what I set in my Makefile, so I don’t think that’s it.

It turns out that I can cudaMalloc and cudaMemcpy an array or a variable, but it’s my InputStruct that causes problems. The memory location that is malloc’d for the struct is in a way different area than for the arrays that work. sizeof(InputStruct) == 32, so it’s not too big or anything. Any other ideas, forum friends?

As an added twist, I can cudaMemcpy another struct without a problem. It even has the exact same types in it. Freaky. The address for the working struct is similar to the non-working one. And if I switch the allocation order, the addresses swap as expected. The printf doesn’t actually print the physical address, of course, but why would the mapping be different for the structs?

I’m pretty lost, but it looks like I can just create a workaround and skirt the problem. Not something I like doing, because I’d rather find out what in the world is going on…

could you post your code (complete but small one to demonstrate the error), then we can check it?

I’ve been trying to replicate the problem, but there’s a wrinkle of complexity that appears essential. The rest of this week I’ll be at a conference, but then I’ll come back and post what I’ve found. Something to do with… something… uh, the way the struct is used in previously executed code? I’m still not sure…

I think the incoming “in” CPU pointer is not memory allocated properly.