Device memory allocation failed, Example 2 CUBLAS First program tried after installation

Hi I just installed CUDA. I have a GeForce 8400GS and I installed CUDA 2.2 on a 64-bit Ubuntu machine. I just compiled an ran the Example 2 program in the CUBLAS manual. This is the first program I have tried.

I compiled it using

gcc main.c -o main -I/usr/local/cuda/include/ -L/usr/local/cuda/lib/ -lcublas -lcublasemu

When I ran the code I got the following message “device memory allocation failed”

The program is as follows:

#include <stdio.h>

#include <stdlib.h>

#include <math.h>

#include <cublas.h>

#define M 6

#define N 5

#define IDX2C(i,j,ld) (((j)*(ld))+(i))

void modify (float *m, int ldm, int n, int p, int q, float alpha,float beta)


		cublasSscal (n-p, alpha, &m[IDX2C(p,q,ldm)], ldm);

		cublasSscal (ldm-p, beta, &m[IDX2C(p,q,ldm)], 1);


int main(int argc, char *argv[])


		int i, j;

		cublasStatus stat;

		float* devPtrA;

		float* a = 0;

		a = (float *)malloc (M * N * sizeof (*a));

		if (!a) {

				printf ("host memory allocation failed");

				return 1;


		for (j = 0; j < N; j++) {

				for (i = 0; i < M; i++) {

						a[IDX2C(i,j,M)] = i * M + j + 1;




		stat = cublasAlloc (M*N, sizeof(*a), (void**)&devPtrA);

		if (stat != CUBLAS_STATUS_SUCCESS) {

				printf ("device memory allocation failed");

				return 1;


		cublasSetMatrix (M, N, sizeof(*a), a, M, devPtrA, M);

		modify (devPtrA, M, N, 1, 2, 16.0f, 12.0f);

		cublasGetMatrix (M, N, sizeof(*a), devPtrA, M, a, M);

		cublasFree (devPtrA);


		for (j = 0; j < N; j++) {

				for (i = 0; i < M; i++) {

						printf ("%7.0f", a[IDX2C(i,j,M)]);


				printf ("\n");


		return 0;


I am not sure what to make of this error, so any help would be greatly appreciated.

Are you using the NVIDIA driver from the Ubuntu repository or the official NVIDIA bundle? If you using the Ubuntu repository, you probably need to install a 185 series driver package from the NVIDIA driver bundle. I have never been able to get the Ubuntu repository drivers to work correctly with CUDA.

I installed this driver

There is definitely something wrong with your CUDA installation, because that code compiles and runs as expected on my 64 bit Ubuntu 8.04LTS system with CUDA 2.2 installed:

david@quadro:~/build/myCUBLAS$ gcc -I/opt/cuda/include -L/opt/cuda/lib -o cublastest cublastest.c -lcublas

david@quadro:~/build/myCUBLAS$ ./cublastest 

	  1	  7	 13	 19	 25	 31

	  2	  8	 14	 20	 26	 32

	  3   1728	180	252	324	396

	  4	160	 16	 22	 28	 34

	  5	176	 17	 23	 29	 35

I suspected there might be, but thought I better ask before re-installing just in case I was doing something stupid that was a common error.

Try building and running the deviceQuery sample from the SDK and see whether you can get it to work.

I have re-installed with version 2.3 and still no luck. All the tests work that I have tried including deviceQuery from the SDK

If I run sudo ./main I get a different error message though. It says “error while loading shared libraries: cannot open shared object file: No such file or directory”

Does this shed any light on the problem?

I am compiling with


and is in that directory

You need to add that path to the link loader cache or your LD_LIBRARY_PATH environment variable. Also it should never be necessary (and certainly not advisable) to run CUDA programs as root. If it is, then you have permissions problems on the /dev/nvidiactl and /dev/nvidia? devices which should be fixed in your udev configuration before you try going much further.

I have the following in my .bashrc

export PATH=/usr/local/cuda/bin:$PATH

export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

I don’t need to run the program as root, I was just getting desperate and tried something random. I still get the following error message when I run it normally “device memory allocation failed”.

When I run the deviceQuery example I get

Device 0: "GeForce 8400 GS"

  CUDA Driver Version:						   2.30

  CUDA Runtime Version:						  2.30

  CUDA Capability Major revision number:		 1

  CUDA Capability Minor revision number:		 1

  Total amount of global memory:				 536150016 bytes

  Number of multiprocessors:					 1

  Number of cores:							   8

  Total amount of constant memory:			   65536 bytes

  Total amount of shared memory per block:	   16384 bytes

  Total number of registers available per block: 8192

  Warp size:									 32

  Maximum number of threads per block:		   512

  Maximum sizes of each dimension of a block:	512 x 512 x 64

  Maximum sizes of each dimension of a grid:	 65535 x 65535 x 1

  Maximum memory pitch:						  262144 bytes

  Texture alignment:							 256 bytes

  Clock rate:									1.40 GHz

  Concurrent copy and execution:				 No

  Run time limit on kernels:					 Yes

  Integrated:									No

  Support host page-locked memory mapping:	   No

  Compute mode:								  Default (multiple host threads can use this device simultaneously)


Press ENTER to exit...

For others that may run into this problem: I got the “device memory allocation failed” when I had the incorrect driver installed (2.3.1) on Mac OSX 10.5 . Installing 2.3 fixed it.

In case anybody else gets this error message I just found the answer. It works when I connect to either libcublas or libcublasemu, but not both at the same time.