Memory cleared after cudaFree


i am new in the cuda world and i have a problem that i dont understand.
I read in the documentation that cudafree just frees the memory but doesnt
overwrite it with zeros.
I wrote a small programm which fills the global memory of the gpu with some
values and frees the memory afterwards.
Then it allocates memory again and prints it out.
My problem now is that on linux i get the values written in the memory before,
but on windows 10 i just get zeros.
I am working with visual studio 2013. My gpu is a Geforce GT 525M.
Cuda version 7.5.
This is my program:

#include “cuda_runtime.h”
#include “device_launch_parameters.h”

#include <stdio.h>

#define ARRAY_SIZE 1024 * 1024

#define ARRAY_a 102410241024
#define ARRAY_b 10241024512

using namespace std;

int main()

char* a_device, *a_host, *b_device, *b_host;

// write 'A's to ARRAY_a Bytes of Graphic Card
if (NULL == (a_host = (char*)malloc(ARRAY_a)))
	printf("a_host = malloc failed\n");

for (int i = 0; i < ARRAY_a; i++)
	a_host[i] = 'A';

if (cudaErrorMemoryAllocation == cudaMalloc((void**)&a_device, ARRAY_a))

	printf(" cudaMalloc(&a_device,...) failed\n");

if (cudaSuccess != cudaMemcpy(a_device, a_host, ARRAY_a, cudaMemcpyHostToDevice))
	printf("cudaMemcpy(a_device, a_host, ARRAY_a, cudaMemcpyHostToDevice) FAILED\n");

// free a_host

// write 'B's to another ARRAY_b Bytes of Graphic Card

if (NULL == (b_host = (char*)malloc(ARRAY_b)))
	printf("b_host = malloc failed\n");

for (int i = 0; i < ARRAY_b; i++)
	b_host[i] = 'B';

if (cudaErrorMemoryAllocation == cudaMalloc((void**)&b_device, ARRAY_b))

	printf(" cudaMalloc(&b_device,...) failed\n");

if (cudaSuccess != cudaMemcpy(b_device, b_host, ARRAY_b, cudaMemcpyHostToDevice))
	printf("cudaMemcpy(a_device, a_host, ARRAY_a, cudaMemcpyHostToDevice) FAILED\n");

//free b_host and the memory on the Graphic card

unsigned long int l = -1;

// Allocates char Arrays of size ARRAY_SIZE, checks if the value is unsimilar to zero and prints it in this case.
// The memory isnt freed, so it does this untill there is no global memory left.
while (1)
	char* c_host;
	if (NULL == (c_host = (char*)malloc(ARRAY_SIZE)))
		printf("malloc failed, %ld Memory Allocated\n", l*ARRAY_SIZE);
	char* c_device;
	if (cudaErrorMemoryAllocation == cudaMalloc((void**)&c_device, ARRAY_SIZE))

		printf("Cudamalloc failed, %ld Memory Allocated\n", l*ARRAY_SIZE);
	if (cudaSuccess != cudaMemcpy(c_host, c_device, ARRAY_SIZE, cudaMemcpyDeviceToHost))
		printf("cudaMemcpy(c_host, c_device, ARRAY_SIZE, cudaMemcpyDeviceToHost) FAILED\n");
	for (int i = 0; i < ARRAY_SIZE; i++)
		if (c_host[i] != 0)
			printf("%c ", c_host[i]);
	printf("%p\n", c_device);

return 0;


Does anybody have an idea why it just prints out zeros on windows?

Thanks, Matthias

You’re exploring undefined behavior. Anything is possible, and even though you’ve observed a particular behavior now, it may change in the future. You should not depend on undefined behavior.

For a windows WDDM GPU, the memory is actually managed by the host operating system i.e. windows, not by CUDA. When you do a cudaMalloc (or cudaFree) on a WDDM GPU in windows, the cudaMalloc called is actually serviced by a memory allocation request made by the GPU driver to the windows operating system, for device memory. The windows operating system can do anything it wants. It may be initializing the area to zero (even though this is not specified or guaranteed), or it may simply be giving you a new pointer to a different area in device memory that you had not previously written to. If you think you are allocating all available device memory, then this second possibility doesn’t seem likely.

I’m offering just some possible insight, not any sort of statement about what you should expect.

You should expect undefined, unpredictable behavior.