I did some more digging under Ubuntu.
ChatGPT had also suggested a few other possible reasons for my issues, one of them being a limit for page-locking memory. I checked this and learned that currently, my Ubuntu installation allows for ~3.88 GiB of memory to page-lock.
So, this does not seem to be the problem.
However, after some more testing, I noticed that my code does indeed work with my Ubuntu installation for test data of up to about 268,000,000 PointXY
objects. At 269,000,000 objects, the code fails.
I did not notice this earlier because my standard test case is using 300,000,000 PointXY
objects, which does work under Windows.
So, the issue might somehow arise due to my GPU’s memory limits, although I don’t really get the math.
Mem space taken up in different scenarios
Following C++'s sizeof()
function, one PointXY
object uses 8 bytes of memory space. This makes sense, seeing that it basically consists of 2 float variables.
So, my code allocates n * 8 bytes
of memory for my PointXY
objects plus n * 1 bytes
for the results. This is assuming that CUDA stores each member of a bool array in 1 byte.
n = 268,000,000
n * 8 + n * 1 = approx. 2300.26 MiB
n = 269,000,000
n * 8 + n * 1 = approx. 2308.85 MiB
Both shown scenarios don’t closely use up all memory available on my GPU.
Also, the screenshot showing a console session of my code posted above shows that under Ubuntu, CUDA has 3781 MiB of GPU memory at its disposal while on the same machine, Windows states to have 4095 MiB available.
If you still need a working code example, let me know.
[EDIT:]
I did btw correct that. My PointXY
class now looks as follows:
PointXY
class PointXY {
// Properties
public:
float2 coordinates;
// Constructors
__host__ __device__ PointXY(float x, float y) {
this->coordinates.x = x;
this->coordinates.y = y;
}
__host__ __device__ PointXY(float2 coordinates) {
this->coordinates.x = coordinates.x;
this->coordinates.y = coordinates.y;
}
PointXY() {
//
}
// Getters
__host__ __device__ float getX() {
return this->coordinates.x;
}
__host__ __device__ float getY() {
return this->coordinates.y;
}
__host__ __device__ float2 getCoordinates() {
return this->coordinates;
}
};