I’m new to CUDA (+ limited C++ knowledge). I have a GTX 970. I’m using the CLion IDE on Ubuntu.
I’m working with a large 2d array of floats. After research I decided to flatten it to a 1d array of size 5877512.
I tried riffing on the “even easier introduction”. For proof of concept I used cudaMallocManaged
to allocate unified memory for my array, and tried to fill it up with the same arbitrary float value.
My code always seg-faults when it tries to insert into array index 1469440. The terminal output reads “Process finished with exit code 139 (interrupted by signal 11: SIGSEGV).”
It seems like I’m hitting some sort of memory limit, but I don’t understand how. If I instantiate the array as normal C++ heap memory, I can fill it up no problem. The error only occurs when I’m using the unified memory with cudaMallocManaged
.
The code is not complicated. Here it is:
#include <iostream>
int main() {
int numRows = 734689;
int numCols = 8;
int arraySize = numRows * numCols; // 5877512
// a 1d array treated as a 2d array using width * csvRow + col indexing convention
float *sliceArray;
cudaMallocManaged(&sliceArray, arraySize);
for(int i = 0; i < arraySize; i++) {
sliceArray[i] = -0.091;
cout << i << endl;
}
}
I’ve run nvidia-smi
, and it shows my program using 45 MiB prior to crashing. I haven’t gotten far enough to get much use from CUDA error checking or cuda-memcheck
.
This doesn’t make sense to me as a memory issue as it seems like there should be more than enough? Maybe there’s some obvious stupid mistake I’m making here?
I’ve spent several hours searching for an answer but nothing has fit. I understand that this is a problem in the host code but figuring out the exact line where the failure occurs hasn’t helped me. I do know the problem is consistent. It always crashes on the exact same iteration–when apparently my array simply will not take one more value.
Help?