I’m trying to do 4D convolution with 14 different filters at the same time, the 4D-filters are of size 9 x 9 x 9 x 9 elements. This would require 367 KB of storage and therefore I can’t store the whole filters in constant memory at the same time. My plan is to instead store 2D slices (9 x 9) of the 4D-filters and then update them as I do the convolution.
I define a struct as
struct float14
{
float a, b, c, d, e, f, g, h, i, j, k, l, m, n;
};
device constant float14 c_Filters[9][9]
such that I can do something like c_Filters[y].a, c_Filters[y].b etc
I’ve however not figured out how I should copy a slice of the filter to the constant memory, I guess that it should be something like this (each filter is stored as [x + y * FILTER_W + z * FILTER_W * FILTER_H + t * FILTER_W * FILTER_H * FILTER_D])
cudaMemcpyToSymbol(&c_Filters, h_Filter_1[z * FILTER_W * FILTER_H + t * FILTER_W * FiLTER_H * FILTER_D], 9 * 9 * sizeof(float)), 0, cudaMemcpyHostToDevice)
Can someone help me?
How is the data for a struct with 14 elements stored? (element + x * NUMBER_OF_ELEMENTS + y * NUMBER_OF_ELEMENTS * FILTER_W ?)