memory allocated by cudaMalloc shows ??? as value.

Hello,
I am allocating few arrays with cudaMalloc. Some of arrays in device code shows instead of pointer only “???” nsight debug fails because of access violation on load (it cannot read from that pointer).

__device__ unsigned int* D_Inputs;		
__device__ unsigned int* D_Offsets;		
__device__ bool* D_NodeTypes;			
__device__ short* D_ItemsCount;			
__device__ unsigned int* D_NodeIndices;	
__device__ bool* D_ResultVector;		
__device__ int* D_ItemOrders;		

void Init()
{
unsigned int freeMemBefore = GetFreeMemory(0);
unsigned int  mMaxBlocks = 2000;
cudaMalloc((void**)&D_Offsets, sizeof(unsigned int) * mMaxBlocks);
cudaMalloc((void**)&D_NodeTypes, sizeof(bool) * mMaxBlocks);
cudaMalloc((void**)&D_ItemsCount, sizeof(short) * mMaxBlocks);
cudaMalloc((void**)&D_NodeIndices, sizeof(unsigned int) * mMaxBlocks);
cudaMalloc((void**)&D_ResultVector, sizeof(bool) * mMaxBlocks);
cudaMalloc((void**)&D_ItemOrders, sizeof(unsigned int) * mMaxBlocks);
cudaMalloc((void**)&D_Inputs, 2000 * mMaxBlocks);
unsigned int freeMemAfter = GetFreeMemory(0);
}

Together is not so much memory. I have 800 MB of free memory. It is interesting. D_ItemsCount is ???, but D_NodeIndices is correct even it is allocated later.

See arrays in kernel nsight debug:

Nsight output:
Memory Checker detected 192 access violations.
error = access violation on load (global memory)
gridid = 12
blockIdx = {0,0,0}
threadIdx = {0,0,0}
address = 0x05d20800
accessSize = 2

It somehow depends on memory size to allocate. If mMaxBlockSize is set to 10, all arrays are allocated ok.
I tested this on gtx 550 wich is primary display. And also on GTX 690 on both gpu cores with similar problem.
Can somebody please tell me what I am doing wrong ?

I answer to this myself. I managed to determine what causes this strange behaviour. Before malloc i called function GetFreeMemory() on line 11. I call this funcion again after allocation on line 20. If I comment line 20 (not calling GetFreeMemory() again. Everything work like a charm. I dont know why. But obviously it helped.
Below is GetFreeMemoryCode() code:

unsigned int GetFreeMemory(CUdevice device)
{
    //create cuda context  
    CUcontext cudaContext;    
    CUresult result = cuCtxCreate(&cudaContext, CU_CTX_SCHED_AUTO, device);  
    if(result != CUDA_SUCCESS)  
    {  
        printf("\nError creating cuda context");  
        return 1;         
    }  
  
    //get the amount of free memory on the graphics card  
    size_t free;  
    size_t total;  
    result = cuMemGetInfo(&free, &total);  
    return free;  
}

My guess is because you’re creating an extra context that I believe is unnecessary… I think you should just be able to do lines 13-16 without the rest of the context creation.