Hi there, im trying to figure out why a cuda_safe_call returns an out of memory error in my application.
Important lines are as follows:
#define KERNEL_W 15
#define KERNEL_H 15
#define DATA_W 8000
#define DATA_H 8000
const int KERNEL_SIZE = KERNEL_W * KERNEL_H * sizeof(float);
const int DATA_SIZE = DATA_W * DATA_H * sizeof(float);
__device__ __constant__ float d_Kernel[KERNEL_W*KERNEL_H];
texture<float, 2, cudaReadModeElementType> texData;
float *d_Result;
cudaArray *a_Data;
cudaChannelFormatDesc floatTex = cudaCreateChannelDesc<float>();
CUDA_SAFE_CALL( cudaMalloc((void **)&d_Result, DATA_SIZE) );
CUDA_SAFE_CALL( cudaMallocArray(&a_Data, &floatTex, DATA_W, DATA_H) );
This last line is the one that fails and i cant really see why. 7000x7000 works fine but 8000x8000 is no go.
Unless im missing something, this amounts to 8000x8000x2 floats to allocate if we forget the small kernel which is in constant memory and well under 65k. The x2 is for data texture array and result array. This then yields 122MB of data.
I have a 8800gt with 512MB of vram.
Can anyone enlighten me on this?
Thanks!