I have allocated a matrix, du, in the device, and would like to obtain an array consists of sum of each column.
To do this I allocate another array, dcolumsum in the device memory.
I call the CUBLAS function in the host:
for(i=0;i<n;i++)
dcolumnsum[i]=cublasSasum (n, du, 1);
and had a compilation error.
May i know what would be the correct way to pass device memory variable into cublas, and had the output in device memory as well?