Noob - Seg fault from just populating a large array in unified memory?

I’m new to CUDA (+ limited C++ knowledge). I have a GTX 970. I’m using the CLion IDE on Ubuntu.

I’m working with a large 2d array of floats. After research I decided to flatten it to a 1d array of size 5877512.

I tried riffing on the “even easier introduction”. For proof of concept I used cudaMallocManaged to allocate unified memory for my array, and tried to fill it up with the same arbitrary float value.

My code always seg-faults when it tries to insert into array index 1469440. The terminal output reads “Process finished with exit code 139 (interrupted by signal 11: SIGSEGV).”

It seems like I’m hitting some sort of memory limit, but I don’t understand how. If I instantiate the array as normal C++ heap memory, I can fill it up no problem. The error only occurs when I’m using the unified memory with cudaMallocManaged.

The code is not complicated. Here it is:

#include <iostream>

int main() {

    int numRows = 734689;
    int numCols = 8;
    int arraySize = numRows * numCols; // 5877512

    // a 1d array treated as a 2d array using width * csvRow + col indexing convention
    float *sliceArray; 

    cudaMallocManaged(&sliceArray, arraySize);

    for(int i = 0; i < arraySize; i++) {
        sliceArray[i] = -0.091;
        cout << i << endl;
    }
}

I’ve run nvidia-smi, and it shows my program using 45 MiB prior to crashing. I haven’t gotten far enough to get much use from CUDA error checking or cuda-memcheck.

This doesn’t make sense to me as a memory issue as it seems like there should be more than enough? Maybe there’s some obvious stupid mistake I’m making here?

I’ve spent several hours searching for an answer but nothing has fit. I understand that this is a problem in the host code but figuring out the exact line where the failure occurs hasn’t helped me. I do know the problem is consistent. It always crashes on the exact same iteration–when apparently my array simply will not take one more value.

Help?

arraySize is the number of elements. However, cudaMallocManaged expects the number of bytes to allocate, which is arraySize * sizeof(float)

Yup. That falls under the category of an obvious stupid mistake. I even calculated the number of bytes to understand how much memory I’d be using but never noticed I wasn’t allocating right!

Thank you!