CudaMallocmanaged() can not exceed more than 65410 iterartions

I am using a Tesla K40 GPU card with 12GB device memory.

I am designing an algorithms using Linked-Lists structures. I have more than 1000,000 of such structures.
When I tried to allocate them in the unified memory I can not exceed nearly 64000.

I tested for different sizes of data structures but still the same issue.

I saw the same bug on this forum as well.

But I couldn’t find a solution yet.

This is my simple code I tested

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include “helpers.cuh”

#define NUM 10000

int* array[NUM];

int main(){

int i=0;

return 0;


The output gives a cuda ERROR from the checkCudaError() function , after 65410 itterations.

Can somebody help me to find a solution for this?

This is a known issue with CUDA 7.5. Upgrade to CUDA 8.0RC or beyond, where it appears to be fixed.