CudaMallocmanaged() can not exceed more than 65410 iterartions

Hi,
I am using a Tesla K40 GPU card with 12GB device memory.

I am designing an algorithms using Linked-Lists structures. I have more than 1000,000 of such structures.
When I tried to allocate them in the unified memory I can not exceed nearly 64000.

I tested for different sizes of data structures but still the same issue.

I saw the same bug on this forum as well. http://www.ceus-now.com/allocate-unified-memory-in-my-program-aftering-running-it-throws-cuda-error-out-of-memory-but-still-has-free-memory/

But I couldn’t find a solution yet.

This is my simple code I tested

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include “helpers.cuh”

#define NUM 10000

int* array[NUM];

int main(){

int i=0;
for(i=0;i<NUM;i++){
	cudaMallocManaged(&array[i],sizeof(int));
	checkCudaError();
}

return 0;

}

The output gives a cuda ERROR from the checkCudaError() function , after 65410 itterations.

Can somebody help me to find a solution for this?

This is a known issue with CUDA 7.5. Upgrade to CUDA 8.0RC or beyond, where it appears to be fixed.

[url]nested - allocate unified memory in my program. aftering running, it throws CUDA Error:out of memory,but still has free memory - Stack Overflow