OS: Windows 7 64bit (host) and Windows Server 2008 (target)
CUDA toolkit version: 4.0 64bit
CUDA SDK version: 4.0
Compiler: Integrated into Visual Studio 2010
Parallel NSight: 2.0
I have declared a device pointer to an array of structs. In a kernel I am trying to set this pointer to a struct array that I already copied to the device earlier (so I don’t have to send the pointer as parameter every kernel call, I’m doing this for many arrays):
// Global.cu
__device__ InflowNode *D_InflowNodes;
// CpyPointers.cu
__global__ void DevCpyInflow(InflowNode *inflowNodes)
{
// This gives the error
D_InflowNodes = inflowNodes;
}
NSight gives the following memory access error:
CUDA Memory Checker detected 1 threads caused an access violation:
Launch Parameters
CUcontext = 002b94a0
CUstream = 00000000
CUmodule = 0346f3a8
CUfunction = 03466248
FunctionName = _Z12DevCpyInflowP10InflowNode
gridDim = {1,1,1}
blockDim = {1,1,1}
sharedSize = 64
Parameters:
Parameters (raw):
0x0197a000
GPU State:
Address Size Type Block Thread blockIdx threadIdx PC Source
------------------------------------------------------------------------------------------
0000c000 4 adr st 0 0 {0,0,0} {0,0,0} 000038 d:\documents\visual studio 2010\projects\cuda\price2d\c++\price2d_0_2\price2d\cpypointers.cu:20
Summary of access violations:
================================================================================
Parallel Nsight Debug
Memory Checker detected 1 access violations.
error = access violation on store
blockIdx = {0,0,0}
threadIdx = {0,0,0}
address = 0x0000c000
accessSize = 4
The error mentions 0x0000c000 as error address, while the actual address of the pointer is 0x0107c000, which is 17MB “later” in memory.
The memory of 0x0000c000 and further looks as follows, which indeed indicates inaccessible memory :
0x000000000000C000 ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
However address 0x0107c000 looks as follows, indicating that it contains NULL (which is correct, because I have not yet stored the pointer on that address):
0x000000000107C000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
The watch shows D_InflowNodes = 0x00000000 and &D_InflowNodes = 0x0107c000, which is valid.
Does anyone on the NVIDIA team have an idea why the addresses are different? I should note I had this error before with another variable (I think it was a pointer to a float array), but shuffling around some code made it disappear. However it keeps coming back when I alter code. This means it is not related specifically to this struct.
I have tested this on two devices, a C1060 and a simple laptop Quadro. I could not perform NSight debugging on the Quadro, but the kernel crashed. This indicates that the problem is not a corrupt card.
I have declared the struct array pointer in another .cu file than the file that accesses it, but through inclusion I made sure the pointer should be accessible, it works perfectly fine for other device pointers that are declared and defined in the same manner.
Edit: see attachment for test with CC1.3 on x64 machine.
MemoryTest.zip (213 KB)