Hello:
Maybe a basic question to OpenCl syntax, but here it comes anyway.
My problem suggests using 2-dimensional matrices like, say, float myMat[NX][NT], where each thread reads the entire input matrix and computes one output row. I thought, I should store them in global memory. However, I cannot figure out how to write matrices to / from global memory: clCreateBuffer, clEnqueueReadBuffer typically do a void *ptr. Using the argument void myVec works, where vector would be something like float myVec[NT], but what about matrices??? Now, I wonder, is it possible at all, and if so what would be the right syntax?
Or am I thinking the wrong way, and I really have to change the algorithm to write one vector (e.g., one row) at a time to / from global memory?
Thanks.
If your matrix is really declared this way:
float myMat[NX][NT];
in your host C code, where NX and NT are compile time constants, then its elements should be continuous in the memory (and, of course, in row-major order), so you could just allocate a buffer of NX * NT * sizeof(float) size in the device (global) memory, and then copy all of them into the device memory in single pass (say through clEnqueueWriteBuffer()). Then you pass the pointer to this buffer to your kernel, and in-there, you just reference matrix elements by:
gp[x * NT + t]
where gp is the pointer, x is the current row index, NT is matrix width, and t is current column index.
On the other side, if you have array-of-pointers-to-rows representation in the host memory, then you’ll have to copy the matrix elements to device memory in row-by-row manner. In the device memory, you should be still able to choose between continuous-array, and array-of-pointers-to-rows representation, although the latter is going to be little bit more complicated to setup (and at the end you should try with copying matrix elements to local memory, and using them in calculations from there, anyway).