The problem is in the static GpuCreate and everyhing is working except the call of cudaMemcpy. I am watching the content of the gpuCuArray device pointer with the nsight ecplipse debugger from NVidia and every property is 0 except the memory location of the pointer. The location of the pointer is 0x7053e0200 which looks fine for me.
The values of cuArray are unequals 0 and as expected. So i think the cudaMemcpy is not working (or better: i am using it wrong somehow).
What am i going wrong?
If you need any further information then please tell me and i will try to provide whatever you think is useful.
There are at least 2 problems with your SO posting.
it is unclear if you are concerned about an actual functional issue with your code, or just concerned about an observation you are making in the debugger.
You have not provided an MCVE (it’s defined on SO, look it up). An MCVE should be a complete code, that someone else could compile and run without having to add anything, and see the issue. Most MCVEs written in C/C++ should include a main routine, for example. They should be complete. Furthermore, the expected behavior and the actual behavior must be defined. This should not really depend on use of a debugger, if you are concerned about the functional (input/output) behavior of your code, as opposed to just asking about a debugger observation
If your code has a functional issue, you should provide a MCVE on SO (or here) which defines the code, the actual behavior of your code, and the expected behavior. This last part should not depend on the debugger (unless you are merely asking about a debugger observation).
Since you’ve not clarified these things (either here or on SO), I’m personally not surprised that you have gotten an unsatisfying answer.
It seems to me that CaArray is a descriptor, not the actual matrix (which is pointed to by the ArrayPointer component of the descriptor). Your code is copying the descriptor with cudaMemcpy(), not the actual matrix. In other words, this is a C/C++ level problem, nothing specific to CUDA.
It seems to me that CuArray is a descriptor, not the actual matrix (which is pointed to by the ArrayPointer component of the descriptor). Your code copies the descriptor with cudaMemcpy(), not the actual matrix. In other words, this is a C/C++ level problem, nothing specific to CUDA.
Thats 100% right. I want to copy that descriptor. The Array itself is allocated in the constructor of this descriptor. The problem is that nothing of this descriptor is copied according to the debugger. I am struggling at the moment with reading the “real values” assumed that the debugger shows me wrong ones.
CuArray is just wrapping the Column and Row of the Array behind ArrayPointer. It is like a Java or C# Array where you can get the length / size of the array. To make the accessing easier on the GPU i want to transfer this Array to the GPU to access Row and Column there, so i don’t have to pass it as a kernel parameter. I just have to pass CuArray as a parameter.
So i am creating the CuArray with CuArray* cuArray = new CuArray(rows, columns); . But this CuArray instance is in the host memory. CuArray contains a pointer to a array which is in GPU memory (because i have called cudaMalloc in the constructor of CuArray). Now i want to transfer the CuArray instance cuArray into the GPU memory. So i am creating space of sizeof(CuArray) with cudaMemalloc and copying the data to the GPU memory with cudaMemcpy expecting that all properties (Row, Column, Elements and ArrayPointer) are the same in cuArray and cuGpuArray. But this is not the case. cudaMemalloc and cudaMemcpy both return CUDA_SUCCESS (or however it is called).
So the content of cuGpuArray seems to be 0. And my question is: What am i doing wrong? Because the malloc is running with success, the copy is running with success, but my data is not copied.
I have not tried to read the data manually without the debugger. i will post or edit when i have concrete values.
The nsight eclipse edition debugger is built on top of cuda-gdb.
cuda-gdb has a limitation that it will not correctly show device data until you are stopped at a breakpoint in device code (which means you have to launch a kernel - which your posted code is not doing):