I just wonder how to make multiplication matrix * vector. Point is that there are some restrictions for min. number of threads, and as we know in a vector there is only one column.
Now I do it like that : when I try to multiply A * x , A -matrix [32 x32], x - vector [32x1]
with block size 16, unless I do not suplement vector by zeros to x =[32x32] where first column include the proper values, and supplied are 0. In this way I lose lots of memory space.
Any pieces of advice ?
Thank you for your help.