Please find attached some bandwidth test results of three NVIDIA GPUs (htod_bandwidth.pdf). You will see the host-to-device bandwidth curves of three GPUs from two test PCs.
Two of the GPUs fail during my tests when allocating device memory over ~30 MB, and I am wondering why. My current hypothesis is that these GPUs, which render the display, have reserved a lot of the global memory for other operations.
If this is the case, I find it strange also that they have to reserve so much space. The GF8400M has a total amount of 128 MB (shared) memory, while the NVS295 GPU has a total of 255 MB. This means that the driver has reserved around 100 MB for the GF 8400M card, and 230 MB for the NVS295 card. It seems like the space left for CUDA operations is statically set to around 30 MB… htod_bandwidth.pdf (16.9 KB)
The OS uses around 200 MB on linux. Depends on the resolution. YOu can check at the beginning of the program how much memory is free.
cudaSetDevice(1);
size_t free, total;
printf("\n");
cudaMemGetInfo(&free,&total);
printf("%d KB free of total %d KB at the beginning\n",free/1024,total/1024);
It is not just that the display driver takes up some of the memory on the card, the CUDA driver itself needs to allocate a fair amount of memory for various purposes, from kernel storage to pre-allocated (thread-)local memory. When I last checked the overall driver foot print several years ago, it summed to about 90 MB. The exact amount probably changes somewhat between CUDA releases and compute capability for the CUDA driver, and with screen resolution and desktop configuration (e.g. 3D features) for the display driver.
I see. My testbeds do differ in display resolution, but other than that they use the same CUDA driver and toolkit. I realize that they somehow reserve space for something; I just find it slightly peculiar that I do not get more that 30 MB space for CUDA memory on my GPU which has twice the amount of memory. But I’ll have a play around with my resolution and see if I can find a pattern…