cudaMalloc difference between Tesla Device and Geforce Device? cudaMalloc on complete global memory

Hi,

are there differences between GeForce 310M and Tesla Devices(Tesla C1060 or Tesla C2050) if i use cudaMalloc?

I want do allocate the whole global memory of the device.

Example for GeForce 310M:

my code

cudaMalloc(mem,1054437333*sizeof(char));

for allocate memory(btw.1054437333 != 1073479680 but one byte more the allocate failed)

to get some information about the memory status.

Output:

GPU memory usage: used = 1023.750000, free = 0.000000 MB, total = 1023.750000 MB <<< OK :) that is what i want.

But with a Tesla Device(Tesla C1060 or Tesla C2050) i can´t allocate memory.

Example for Tesla C1060:

my code

cudaMalloc(mem,3294770688*sizeof(char));

for allocate memory

to get some information about the memory status.

Output:

GPU memory usage: used = 40.746338, free = 4055.066162 MB, total = 4095.812500 MB FAIL :( used should be 3,xxx GIG

Greets

Sven

cudaMalloc(mem,3294770688*sizeof(char));

A “naked” constant like that will be treated as a signed integer, which has a maximum value of 2147483648. So you are probably the victim of integer overflow. Try explicitly casting the constant to a size_t, or specifying the constant as an unsigned long, so either:

cudaMalloc(mem,sizeof(3294770688)*sizeof(char));

or

cudaMalloc(mem,3294770688ul*sizeof(char));

and see what happens.

Both solutions not worked. :( The output is always used 40.746338, free = 4055.066162 MB, total = 4095.812500 MB…

What operating system is this on? And what status are the cudaMalloc calls returning?

//edit sorry it is not cudaSuccess!!! I get Segmentation fault.

cudaError_t cuda_status2 = cudaMalloc(mem,3004437333ul*sizeof(char)); <<< Seg_Fault follows

cudaMalloc(mem,3004437333ul*sizeof(char)); No Seg_Fault

[b]//edit2
cudaError_t cuda_status2 << second var with cudaError_t caused the error. Seg_Fault.

Status is cudaSuccess!!!

printf(cudaGetErrorString(cuda_status)) → no error!!!
[/b]

System is Ubuntu Server.
Linux cuda 2.6.35-22-server #33-Ubuntu SMP Sun Sep 19 20:48:58 UTC 2010 x86_64 GNU/Linux

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2010 NVIDIA Corporation
Built on Wed_Nov__3_16:16:57_PDT_2010
Cuda compilation tools, release 3.2, V0.2.1221

CUDA Driver Version: 3.20
CUDA Runtime Version: 3.20
CUDA Capability Major/Minor version number: 1.3

With followong code i get

int a=0;

while(a<140){

        double inputData[999999];

        int *dev_inputData;

        cudaMalloc((void**)&dev_inputData,999999*sizeof(double));

        cudaMemcpy(dev_inputData,inputData,999999*sizeof(double),cudaMemcpyHostToDevice);

a++;

}

GPU memory usage: used = 1116.938232, free = 2978.874268 MB, total = 4095.812500 MB

but the Code

void** mem;

void* dev_mem;

cudaMalloc(mem,(999999*140)*sizeof(double));

cudaMemcpy(dev_mem,mem,(999999*140)*sizeof(double),cudaMemcpyHostToDevice);

GPU memory usage: used = 40.746338, free = 4055.066162 MB, total = 4095.812500 MB

mhm ok.

Solution for Tesla

void** mem;

cudaMalloc(&mem,1054437333*sizeof(char));

this worked on Tesla Device. But on my geforce 310M not.

Solution for geforce 310M without &

void** mem;

cudaMalloc(mem,1054437333*sizeof(char));