CUDA 6: Simplest Sample Segmentation Fault

I get access to CUDA 6 RC as register developer and I want try to use new feature of CUDA 6: Unified Memory. So, I created simple example when I try use this feature: Here is me example:

#include <stdio.h>
#include <cuda_runtime.h>

int numElements = 5000;
size_t size = numElements * sizeof(float);
float *a;
cudaMallocManaged(&a, numElements);

for (int i = 0; i < numElements; ++i)
     a[i] = rand()/(float)RAND_MAX;

return 0;

I tried run it example, but I got segmentation fault error:

Segmentation fault: 11

Question - what I doing wrong?

I have the same problem. cudaMallocManaged returns and the pointer is NULL. I have tried with -arch=sm_20, 30, and 35, and I am using a GTX780. I also cannot find any official documentation for this function.

Edit: I found the documentation included in the toolkit installer. It states:
Unified Memory has three basic requirements:

  • a GPU with SM architecture 3.0 or higher (Kepler class or newer)
  • a 64-bit host application and operating system, except on Android
  • Linux or Windows

I’m running on Windows in 32-bit mode so that’s probably it.

Yeah, I tried in 64-bit mode and that solved the issue.

Your malloc is still wrong… it’s allocating number of elements, not the byte size. It should be

cudaMallocManaged(&a, size);

And you should also check for success to be pedantic.

I have problem with cudaMallocManaged() as well.

  1. if I use cudaCheckErrors(cudaMallocManaged(&a, size)), the return error is unknown 71
  2. After cudaMallocManaged(), the value of a is NULL.
  3. My machine has two Tesla K20c, and is a 64-bit Linux host.
  4. The CUDA driver version is NVIDIA-SMI 331.49 Driver Version: 331.49
  5. Even the SDK examples that use cudaMallocManaged() didn’t go through.



Is there anyone who has encountered similar problems?

What is your OS?

$ uname -a
Linux ***.edu 2.6.32-431.5.1.el6.x86_64 #1 SMP Fri Jan 10 14:46:43 EST 2014 x86_64 x86_64 x86_64 GNU/Linux


It’s RHEL 6.

I have the same problem. cudaMallocManaged() returns NULL after some successful allocation.
Is there any memory limitation on cudaMallocManaged except global memory size? I am just allocating some MBs and then get null pointer!

I am using Tesla K20c.