Hi,
at the moment i have some trouble regarding allocating memory for a 3d texture.
System: Win 7 x64
System Memory: 12 GB
CPU: Core i7 920 @ 2.67 GHz
Chipset: x58
Cuda Device: Quadro FX 5800 and Tesla C1060 each with 4 GB of Memory
Toolkit Version: 3.2
SDK Version: 3.2
C++: MS Visual Studio 2008 SP1
Both of the cards i have tested have got a compute capability of 1.3. I also made sure,
that the nvcc flag is set accordingly to compile for 1.3
The programming guide states, that the maximum allowed size of a texture
bound to a 3D Array is 2048 x 2048 x 2048.
However, when trying to allocate more than 1216 x 1216 x 1216 my program terminates with
an out of memory exception. Allthough i have 4 GB of Video Memory on each of the devices.
This is the memory information from within the program:
Total mem: 4261085184 free mem: 4122873856
What i basically did to test the maximum size i can allocate is:
typedef unsigned char uchar;
texture<uchar, 3, cudaReadModeNormalizedFloat> tex; // 3D texture for volume
cudaArray *d_volumeArray = 0; // Memory for the volume data
cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc<uchar>();
size_t total, free, temp, used;
cutilSafeCall( cudaMemGetInfo(&free, &total));
printf("Total mem: %lu \t free mem: %lu\n", total, free);
for(int i = 1; i < 1024; ++i)
{
cudaExtent volSize = make_cudaExtent(i * 2, i * 2, i * 2); // 2, 4, 6, 8, ....
cutilSafeCall( cudaMemGetInfo(&temp, &total)); // get memory information
printf("Total mem: %lu \t free mem before malloc: %lu\n", total, temp); // output
cutilSafeCall( cudaMalloc3DArray(&d_volumeArray, &channelDesc, volSize) ); // allocate the memory on the device
cutilSafeCall( cudaMemGetInfo(&free, &total) ); // update memory information
used = temp - free; // calculate the required amount of memory in bytes
printf("used %lu amount of mem\n", used);
cutilSafeCall( cudaFreeArray(d_volumeArray) ); // free the array
cutilSafeCall( cudaMemGetInfo(&free, &total)); // update memory info
printf("Total mem: %lu \t free mem after free: %lu\n", total, free); // output
}
using this little test i got the following values for the memory allocated:
from (994, 994, 994) to (1024, 1024, 1024) cudaMalloc3DArray uses 1073741824 bytes which is 1024^3
from (1026, 1026, 1026) to (1056, 1056, 1056) cudaMalloc3DArray uses 1213267968 bytes which is some strange value of 1066,560700... ^3
from (1058, 1058, 1058) to (1088, 1088, 1088) cudaMalloc3DArray uses 1287913472 bytes which is 1088^3
from (1090, 1090, 1090) to (1120, 1120, 1120) cudaMalloc3DArray uses 1445068800 bytes which is some strange value of 1130,566661... ^3
from (1122, 1122, 1122) to (1152, 1152, 1152) cudaMalloc3DArray uses 1528823808 bytes which is 1152^3
from (1154, 1154, 1154) to (1184, 1184, 1184) cudaMalloc3DArray uses 1704656896 bytes which is some strange value of 1194,571987... ^3
from (1186, 1186, 1186) to (1216, 1216, 1216) cudaMalloc3DArray uses 1798045696 bytes which is 1216^3
When trying to allocate memory with an extent of (1218, 1218, 1218) the out of memory exception is returned by cudaMalloc3DArray()
but 1218^3 is just 1806932232 bytes. Even if i assume, that the allocation of more memory than required as listed above is continuing, then
one would assume, that the next strange number is somewhat of 1258,577… and rounding it up to 1260, this would only require 2000376000 bytes of memory.
Remember that the program tells me that i have free mem: 4122873856 bytes…
Any hints if this is something i just missunderstood or if it’s a bug that will be fixed in future releases of the toolkit or
the driver itself would be very helpful.
Thanks
Tobi