NSight reports different memory address than actual address

OS: Windows 7 64bit (host) and Windows Server 2008 (target)

CUDA toolkit version: 4.0 64bit

CUDA SDK version: 4.0

Compiler: Integrated into Visual Studio 2010

Parallel NSight: 2.0

I have declared a device pointer to an array of structs. In a kernel I am trying to set this pointer to a struct array that I already copied to the device earlier (so I don’t have to send the pointer as parameter every kernel call, I’m doing this for many arrays):

// Global.cu

__device__ InflowNode *D_InflowNodes;

// CpyPointers.cu

__global__ void DevCpyInflow(InflowNode *inflowNodes)


	// This gives the error

	D_InflowNodes = inflowNodes;


NSight gives the following memory access error:

CUDA Memory Checker detected 1 threads caused an access violation:

Launch Parameters

    CUcontext    = 002b94a0

    CUstream     = 00000000

    CUmodule     = 0346f3a8

    CUfunction   = 03466248

    FunctionName = _Z12DevCpyInflowP10InflowNode

    gridDim      = {1,1,1}

    blockDim     = {1,1,1}

    sharedSize   = 64


    Parameters (raw):


GPU State:

   Address  Size      Type       Block  Thread         blockIdx  threadIdx      PC  Source


  0000c000     4    adr st           0       0          {0,0,0}    {0,0,0}  000038  d:\documents\visual studio 2010\projects\cuda\price2d\c++\price2d_0_2\price2d\cpypointers.cu:20

Summary of access violations:


Parallel Nsight Debug

Memory Checker detected 1 access violations.

error = access violation on store

blockIdx = {0,0,0}

threadIdx = {0,0,0}

address = 0x0000c000

accessSize = 4

The error mentions 0x0000c000 as error address, while the actual address of the pointer is 0x0107c000, which is 17MB “later” in memory.

The memory of 0x0000c000 and further looks as follows, which indeed indicates inaccessible memory :

0x000000000000C000  ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??

However address 0x0107c000 looks as follows, indicating that it contains NULL (which is correct, because I have not yet stored the pointer on that address):

0x000000000107C000  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

The watch shows D_InflowNodes = 0x00000000 and &D_InflowNodes = 0x0107c000, which is valid.

Does anyone on the NVIDIA team have an idea why the addresses are different? I should note I had this error before with another variable (I think it was a pointer to a float array), but shuffling around some code made it disappear. However it keeps coming back when I alter code. This means it is not related specifically to this struct.

I have tested this on two devices, a C1060 and a simple laptop Quadro. I could not perform NSight debugging on the Quadro, but the kernel crashed. This indicates that the problem is not a corrupt card.

I have declared the struct array pointer in another .cu file than the file that accesses it, but through inclusion I made sure the pointer should be accessible, it works perfectly fine for other device pointers that are declared and defined in the same manner.

Edit: see attachment for test with CC1.3 on x64 machine.
MemoryTest.zip (213 KB)

I managed to narrow down the bug to a very small project. I declare 27 device structs, which should not be anything special. When I try to define the 27th struct, the access violation occurs.

When I run the code with debug information, the error occurs. When I run the code without debug information, the error does not occur. I wonder whether this is a coincidence, or a bug in the debugger (how ironic that would be).

I made a test that copies a value from host to device, then to a variable in the struct and then back to the host. The code is attached to the beginpost. The Executables folder contains executables with an without debug information, built for CC1.3 and x64 (error also occurs with Win32). If anyone has any idea what I am doing wrong, I’d love to hear your ideas! Basically this error leaves the NSight debugger useless for me…

For anyone who runs into this issue: NVIDIA confirmed that it is a CUDA bug.

It was fixed in Parallel Nsight (currently only available as release candidate).

I get this (or a similar issue) even with NSight (and CUDA Toolkit 4.1). I’m using the driver API.

All my tests pass, but when I run with memory checker it reports lots of “Access violation on load (global memory)”.

CUDA Memory Checker detected 608 threads caused an access violation:

Launch Parameters

CUcontext    = 00b2d380

CUstream     = 029b3cf0

CUmodule     = 07417980

CUfunction   = 0943e4a0

FunctionName = blackHoleCorrection

gridDim      = {8,8,1}

blockDim     = {16,16,1}

sharedSize   = 0


Parameters (raw):

     0x0bca0000 0x00000002 0x00000080 0x00000080

     0x00000080 0xcccccccc 0x0bc70000 0x00000002

     0x00000080 0x00000080 0x00000080 0xcccccccc


GPU State:

Address Size Type Mem Block Thread blockIdx threadIdx PC Source

9000000009 4 adr ld g 0 128 {0,0,0} {0,8,0} 0009c0 No source available

30bca0e00 4 adr ld g 0 129 {0,0,0} {1,8,0} 0009c0 No source available

30bca0e04 4 adr ld g 0 130 {0,0,0} {2,8,0} 0009c0 No source available


So for some reason the upper half of the pointer has been increased by 1 (not to mention the very strange 9000000009 address).