Hello !! I have a problem with a matrix defines as std::vector< std::vector >.
The matrix is allocated in memory lineary, but between row and row exists a padding (a number of memory position with any value normaly 0.0).
In the kernel, the code for reference the column is :
x=threadIdx.x+blockIdx.x*ROWS_BY_BLOCK
y=threadIdx.y+blockIdx.y*COLS_BY_BLOCK
if (x==0) col=y;
else col=(x*NUM_COLS)+padding[x-1]+y;
In the first row not exist padding, but there is in the next rows.
padding[x-1] is the accumulate padding of the previous row,
This works fine, the problem is:
1.- In some cases, for example, a matrix of 2 rows and 3000 columns, the padding of the second row is very big. And when I copy the matrix from host to gpu, I need to copy the memory of the padding too. The memory grow enormely.
2.- In some cases, calculate the array of padding is very expensive in computation because I need pass by all the memory positions for detect a change of row and count this padding (the padding not is constant between rows), and this for all rows.
The first solution is use two loop for pass from std::vector<std::vector to a structure like float[…][…] or float ** (in this structures not exist padding), but i don’t want to do it by the time spent.
I’ve tried with the capacity method of STL but it don’t tell us this padding.
I use memcpy to copy memory from host to device and cudamalloc for reserve linear memory in the gpu.
What’s happen??
How can i resolve it??
Best regards, Francisco - Spain.