cudaErrorMemoryAllocation error

I’ve a CUDA function that compute a simple operation on an array.

__global__ void funct (int *v, int *dest){
      int idx=blockIdx.x*blockDim.x + threadIdx.x;

      dest[idx] = 3* v[idx]+4;
      }

v is vector initialized, and d is a destination vector.

When I run the program on 1000, 10000, 10000 elements of array there isn’t any problems. On the other and, when elements are 1000000 an error occurs: cudaErrorMemoryAllocation.

How do I solve the problem?

For 1000000 elements the parameters of kernel function are: DIM_GRID(1954,1) DIM_BLOCK(512,1,1)

Hi,

1954 x 512 = 1000448 so you must do something like :

#define N 1000000 
__global__ void funct (int *v, int *dest){
    int idx=blockIdx.x*blockDim.x + threadIdx.x;
    if(idx>=N)
        idx = N-1;
    dest[idx] = 3 * v[idx] + 4;
}

Or add N in the arguments of the kernel.

You should provide the way you allocate the memory. You should do :

int *v_dev, *dest_dev;
cudaMalloc((void**)&v_dev,sizeof(int)*N);
cudaMalloc((void**)&dest_dev,sizeof(int)*N);

You seem to be far from the amount of memory available on your GPU but this is the kind of error you have when you try to allocate more than the GPU can.

I’ve already tried but the error persists.
So to be more precise the error appears on second cudaMalloc and not on the first:

cudaMalloc((void**)&dest_dev,sizeof(int)*N);

Can you provide the card, the system and the number of screens you are working on ?
Also you can check the available memory in nvidia-settings (if you are on a Linux OS).

I use Visual Studio 2010 on Windows 7.

Graphic Card is NVIDIA GeForce 8400M GT (Compute capability 1.1). These are the features provided by NSight:

ASYNC_ENGINE_COUNT 1
CAN_MAP_HOST_MEMORY 1
CAN_TEX2D_GATHER 0
CLOCK_RATE 900000
COMPUTE_CAPABILITY_MAJOR 1
COMPUTE_CAPABILITY_MINOR 1
COMPUTE_MODE 0
CONCURRENT_KERNELS 0
DISPLAY_NAME GeForce 8400M GT
ECC_ENABLED 0
GLOBAL_MEMORY_BUS_WIDTH 64
GPU_OVERLAP 1
GPU_PCI_DEVICE_ID 69603550
GPU_PCI_EXT_DEVICE_ID 1062
GPU_PCI_REVISION_ID 161
GPU_PCI_SUB_SYSTEM_ID 2416250957
INTEGRATED 0
KERNEL_EXEC_TIMEOUT 1
L2_CACHE_SIZE 0
MAX_BLOCK_DIM_X 512
MAX_BLOCK_DIM_Y 512
MAX_BLOCK_DIM_Z 64
MAX_GRID_DIM_X 65535
MAX_GRID_DIM_Y 65535
MAX_GRID_DIM_Z 1
MAX_PITCH 2147483647
MAX_REGISTERS_PER_BLOCK 8192
MAX_SHARED_MEMORY_PER_BLOCK 16384
MAX_THREADS_PER_BLOCK 512
MAX_THREADS_PER_MULTIPROCESSOR 768
MAXIMUM_SURFACE1D_LAYERED_LAYERS 0
MAXIMUM_SURFACE1D_LAYERED_WIDTH 0
MAXIMUM_SURFACE1D_WIDTH 4096
MAXIMUM_SURFACE2D_HEIGHT 65536
MAXIMUM_SURFACE2D_LAYERED_HEIGHT 0
MAXIMUM_SURFACE2D_LAYERED_LAYERS 0
MAXIMUM_SURFACE2D_LAYERED_WIDTH 0
MAXIMUM_SURFACE2D_WIDTH 4096
MAXIMUM_SURFACE3D_DEPTH 0
MAXIMUM_SURFACE3D_HEIGHT 0
MAXIMUM_SURFACE3D_WIDTH 0
MAXIMUM_SURFACECUBEMAP_LAYERED_LAYERS 0
MAXIMUM_SURFACECUBEMAP_LAYERED_WIDTH 0
MAXIMUM_SURFACECUBEMAP_WIDTH 0
MAXIMUM_TEXTURE1D_LAYERED_LAYERS 512
MAXIMUM_TEXTURE1D_LAYERED_WIDTH 8192
MAXIMUM_TEXTURE1D_LINEAR_WIDTH 134217728
MAXIMUM_TEXTURE1D_MIPMAPPED_WIDTH 8192
MAXIMUM_TEXTURE1D_WIDTH 8192
MAXIMUM_TEXTURE2D_GATHER_HEIGHT 0
MAXIMUM_TEXTURE2D_GATHER_WIDTH 0
MAXIMUM_TEXTURE2D_HEIGHT 32768
MAXIMUM_TEXTURE2D_LAYERED_HEIGHT 8192
MAXIMUM_TEXTURE2D_LAYERED_LAYERS 512
MAXIMUM_TEXTURE2D_LAYERED_WIDTH 8192
MAXIMUM_TEXTURE2D_LINEAR_HEIGHT 65000
MAXIMUM_TEXTURE2D_LINEAR_PITCH 1048544
MAXIMUM_TEXTURE2D_LINEAR_WIDTH 65000
MAXIMUM_TEXTURE2D_MIPMAPPED_HEIGHT 8192
MAXIMUM_TEXTURE2D_MIPMAPPED_WIDTH 8192
MAXIMUM_TEXTURE2D_WIDTH 65536
MAXIMUM_TEXTURE3D_DEPTH 2048
MAXIMUM_TEXTURE3D_DEPTH_ALTERNATE 0
MAXIMUM_TEXTURE3D_HEIGHT 2048
MAXIMUM_TEXTURE3D_HEIGHT_ALTERNATE 0
MAXIMUM_TEXTURE3D_WIDTH 2048
MAXIMUM_TEXTURE3D_WIDTH_ALTERNATE 0
MAXIMUM_TEXTURECUBEMAP_LAYERED_LAYERS 0
MAXIMUM_TEXTURECUBEMAP_LAYERED_WIDTH 0
MAXIMUM_TEXTURECUBEMAP_WIDTH 8192
MEMORY_CLOCK_RATE 602000
MULTIPROCESSOR_COUNT 2
PCI_BUS_ID 1
PCI_DEVICE_ID 0
PCI_DOMAIN_ID 0
RAM_LOCATION 1
RAM_TYPE 5
SURFACE_ALIGNMENT 256
TCC_DRIVER 0
TEXTURE_ALIGNMENT 256
TEXTURE_PITCH_ALIGNMENT 32
TOTAL_CONSTANT_MEMORY 65536
TOTAL_MEMORY 67108864
UNIFIED_ADDRESSING 0
WARP_SIZE 32

Solved! :

Turn Off Windows Aero!