CUDA fails to allocate large chunk of memory

I’d like to allocate a big block of memory (roughly 33MB) on the device. I’ve been using malloc as this code runs one time, and therefore doesn’t really need to be performant, however I’ve tested this with cudaMalloc as well.

canvas->antialiasing_samples = DO_ANTIALIASING ? ANTIALIASING_SAMPLES : 1;
int size_multiplier = canvas->antialiasing_samples * canvas->width * canvas->height;
printf("%d\n", sizeof(Vector<int> *) * size_multiplier); //33554432
printf("%x\n", (Vector<int> **)malloc(sizeof(Vector<int> *) * size_multiplier)); // 0
canvas->antialiasing_colors_array = (Vector<int> **)malloc(
    sizeof(Vector<int> *) * canvas->antialiasing_samples * canvas->width * canvas->height); // NULL

Setting size_multiplier to a smaller number in the kilobytes range works just fine. Is there some upper limit on malloc calls, or is there something else prohibiting me from allocating this much memory? I’m on an RTX 2070 with 8GB of memory, so this seems unusual to me.

yes, in-kernel malloc (or in-kernel new, or in-kernel cudaMalloc) is limited to the size of the device heap, which is adjustable. This is a commonly asked question, so you can find many write-ups on it, but the documentation covers everything you need to know. I recommend reading all of that section (B.33), including all of the sub-sections (B.33.1, B33.2, etc.)

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.