Crashes leads to 'out of memory'

Hi everyone, this is my first post here and I have some trouble with a memory allocation:

  1. Is it OK to allocate for instance 64x34x64 3d textures?

  2. My program crasches at “cutilSafeCall( cudaMemcpy3D(&copyParams));” with a segmentation fault at (64x34x64) never if im using 64x64x64.

  3. After several failures and crashes, none of my cuda code will execute due to “out of memory”. Ive heard that the device should free the memory, even if the launch fails.

  4. Sometimes my data_h pointer has the memory address 0x2aaac488b010, is this normal? Seems very large compared to what im used :o)

My questions relates to the following code snippet (data_d is the device cudaArray pointer and dada_h is the host pointer):


void allocateVolumeMemory(cudaArray *(&data_d), cudaExtent extent)


        cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc<float>();

        cutilSafeCall(cudaMalloc3DArray(&data_d, &channelDesc, extent));

        checkCUDAError("Failed allocateVolumeMemory");


void transferVolumeDataToDevice(cudaArray (&data_d), const float data_h, cudaExtent extent)


            cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc<float>();

// copy from host memory to device memory

            cudaMemcpy3DParms copyParams = {0};

            copyParams.srcPtr = make_cudaPitchedPtr((void*)data_h, extent.width*sizeof(float), extent.height, extent.width);

            copyParams.dstArray = data_d;

            copyParams.extent = extent;

            copyParams.kind = cudaMemcpyHostToDevice;

//copy from host memory to device memory

            cutilSafeCall( cudaMemcpy3D(&copyParams));

checkCUDAError(“Failed transferVolumeDataToDevice”);



Im running Linux x64

Thank you

This is common on all platforms if the crash was severe enough, or you have enough crashes to trigger some condition inside the CUDA api/drivers… I encounter the same issue using the Driver API quite often when working on new kernels / debugging crashes and other issues.

I’m not sure if this has officially been brought up with nVidia before though…

Sadly I don’t know enough about the Runtime API to actually help you with what could be going wrong though.

There’s nothing officially say if is not allowed, however, I second the problem. It takes me a whole day to figure out what is going on to finally realize that I can not use every kernel size. I try the 80 x 112 x 80 kernel it is fine, however not 72x96x80. It seems that the width should be a multiplier of 16, but in your case even with a multiplier of 16 it is still crashed. It should be a bug, i’m waiting for the fix

I don’t really get this, kernel size? Im talking about a texture memory allocation (3d), how does that affect kernel sizes?

Also; If i allocate extra memory for the host array, say: 128x128x128 floats (on the host), there is no crashes.

Sounds more like an out of bounds error and not a driver/CUDA problem. Once the kernel/copy fails

can you print the error code/message to make sure its not out of bounds problem?

also if you want to debug it, try valgrind on linux or on windows try to figure out why you’re accessing data too far beyond your


This certainly applies for the other post (from Linh Ha) - there is no mul 16 limit either on the kernel size or the memory you allocate.


First of all, thanks to all of you for the fast replies!

There seems to be a problem with the size of the host array.

If I allocate new float[width * height * depth] and height < width it will cause a crash. Thus

128x127x128 -> crash

128x128x128 -> works

128x128x64 -> works

and so on…

Can you see any problem with my allocations?

I doubt if it has anything to do with the allocations its probably because in the kernel when you access the array

you go beyond the bounderias of the array.

You probably have some sort of code like this at the end of the kernel:

myFaultyArray[ iOutputPosition] = fCalculatedValue;

iOutputPosition is probably causing out of bounds access. Try to access fixed positions in the array, like : myFaultyArray[ 0 ] += fCalculatedValue;

and see if this crashes. If its not (and probably won’t) try to see why iOutputPosition is not calculated correctly, probably for the last block or some

other situation…


Actually, it crashes at the:

cutilSafeCall( cudaMemcpy3D(&copyParams));

Thus, before any kernel execution.

copyParams.srcPtr = make_cudaPitchedPtr((void*)data_h, extent.width*sizeof(float), extent.height, extent.width);

Thats the problem…

height and width should change position… Sometimes you just don’t see the most basic errors :) Thank you all for trying to help me!