cudaMalloc failure

I am observing some extremely odd cudaMalloc behavior.

This is the line where devicePixels is declared:

device Uint32 *devicePixels = NULL;

This is the line of code that SHOULD be allocating 256 bytes of device memory and storing the pointer in device pixels.:

cudaMalloc((void**)&devicePixels, 256);

Except that devicePixels is NULL afterwards, and the value returned from this call is cudaSuccess. Sounds very successful to me.

Note that this is the result regardless of how many bytes I try to allocate. Help?

Actually, every cudaMalloc fails that way, even if I reduce my program to:

device float *POINTERLOL;

int main(int argc, char *argv)


cudaMalloc((void**)&POINTERLOL, 256); // try to allocate 256 stupid bytes

return 0;


POINTERLOL will be NULL, and the return value is cudaSuccess.

May be you are trying to allocate memory without initializing the device (GPU)

try the following code, (Remember device is implied)

int main(int argc, char** argv) 


        float *devPtr1;

        int size = 256;


       CUDA_SAFE_CALL(cudaMalloc((void**)&devPtr1, size));

        CUDA_SAFE_CALL(cudaMemset(devPtr1, 0, size));



Well first of all, that code snippet doesn’t even allocate device memory, which is what I’m saying is the problem here.

Second of all, the runtime API documentation explicitly states that the only “initialization” required is cudaSetDevice(int deviceIndex) and even this is optional, as executing a device command without selecting a device will automatically initialize device 0. This initialization is done per thread.

The supporting evidence for this would be that everything was working without any calls to CUDA_INIT_DEVICE() or cudaSetDevice for a time being, then it just stopped (which actually makes me a little nervous, did I permanently damage my 8800 GTX by running a crashing CUDA program?).

The only thing I can think of is that the main difference between a program you’re running and this one is that I’m linking with SDL and SDLmain.lib, which requires that I set the code generation for the project to multithreaded DLL or multithreaded debug DLL. I don’t understand why this would have this kind of effect however, or still how I was able to successfully cudaMalloc objects before the program crashed this one time. (Before anyone asks, YES I have restarted my computer a dozen times since then, same thing).

Sure it does, this is how every sdk exemple allocates device memory, check em out!

You dont need to specity device when allocating device memory, if this is what is throwing you off.

If you allocate it with cudamalloc, it is device memory. Then all you need to do is pass that pointer as an argument when you call the kernel.

don’t be nervous, be cool, be cool.

It might be due to some mistake(s) in your program.

First of all make sure that your 8800 GTX is working properly. Simply do the following to ensure that.

go to your $(cuda installation path)\NVIDIA CUDA SDK\bin\win32\Release and you will find a lots of executable there. Run any one of them (eg:- nbody.exe)

If it is working, Yessssssss, your GTX is working properly. It has no damage at all, cudaMalloc call will work in you system properly.

If it is showing some sort of error or TEST FAILED message from any of those executables, ohhhh GOD you have some problems with your GTX Card.

But you are still in the game, try to re-install the driver, cuda sdk, toolkit everything.

and have another go… if the result is -ve again.

sorry, you might have damaged your 8800 GTX.

Also, I could never believe that a crashing program could damage the expensive 8800GTX card?

Unbelievable. That will definitely affect nvidia’s reputation. sure… isn’t it?