Yes, sir.
Well, okay. Even when I replace the PointXY constructor with a simple address assignment, the profiler’s statement stays roughly the same.
…I mean, I can see why the profiler would think that I don’t have to copy the memory here as I’m already using mapped pinned memory. But then, this line is the whole point of having a copy kernel…
I have no idea…
sizeof(PointXY) says it’s 8 bytes which makes sense as each PointXY object contains two coordinates of type float.
This is what the PointXY class looks like.
__host__ __device__ class PointXY {
// Properties
public:
float x;
float y;
__host__ __device__ PointXY(float x, float y) {
this->x = x;
this->y = y;
}
PointXY() {
//
}
};
I experimented with several grid configurations. I think in profiling run shown the screenshots I was using the following:
Grid configuration
int blockSize = 1024;
int numBlocks = 60;
const int blockSizeCpy = 512;
const int numBlocksCpy = 60;
int iChunkSize = 5000000;
…while using 3 Streams. While I suppose with 3 Streams it would have been more sensible to set blockSizeCpy = 256 in order to make H2D and D2H copying and ptInPoly execution possible all at the same time, I don’t think this should be the issue here, right?
As mentioned earlier, I’m exploring possibilities to employ the GPU for excessive geometrical calculations on behalf of my company. So, part of what I want to achieve is well-documented proof of concept code, ideally following the most important best practices and - on the other hand - offering reasonably good performance. That’s why I tried to dig into overlapping transfers in the first place. Obviously, this doesn’t work in all possible environments, which is a bummer. This brought me to implementing zero-copy kernels in order to enforce overlapping “manually” which seemed to be a promising way. While digging in further is really interesting and I see there’s a whole world to explore there, you kind of lost me in-depth analyzing how the kernels work.
Right now, I feel like I’m opening parenthesis after parenthesis and can’t seem to find the closing ones. I don’t really know which would the best way to go on from here.
