Hello there,
while playing around with cublasSgeam I encountered a problem when trying to transpose the first-argument martix A.
This is what i do: i´m having a vector in device memory set by cublasSetVector() consisting of 256 elements and a matrix consiting of 5000 columns (since it is column-major format) of this vector, so the matrix is 256x5000 (RxC). The matrix is set by cublasSetVectorAsync() and the whole thing is working fine, as i have tested it by copying the matrix back to device memory with cudaMemcpy() and cublasGetMatrix(). Since the cuBLAS-library uses column-major format, at some point in my code i´d like to transpose the matrix, so i´m testing the functionality of this.
And here´s the issue: when using cublasSgeam() with CUBLAS_OP_T for matrix A i´m always getting the CUBLAS_STATUS_INVALID_VALUE status and no execution is taking place at all, while using CUBLAS_OP_N works fine.
The cublasSgeam call with CUBLAS_OP_N (cudaTest() and cublasTest() just print the error and the given string):
void printBlDMatrix(const float *matrix, const int rows, const int cols)
{
cudaDeviceSynchronize();
cudatest(cudaPeekAtLastError());
float *h_out, *d_out;
const float a=1, b=0;
h_out = new float[rows*cols];
cudaMalloc(&d_out, sizeof(float) * rows*cols);
cublasHandle_t handle;
cublasCreate(&handle);
cudaDeviceSynchronize();
cudatest(cudaPeekAtLastError(), "printMatrix00:") ;
cublastest(cublasSgeam(handle,
CUBLAS_OP_N, CUBLAS_OP_N,
rows, cols,
&a,
matrix, rows,
&b,
m, rows,
d_out, rows), "printMatrix01:" );
cudaDeviceSynchronize();
cudatest(cudaPeekAtLastError(), "printMatrix02: ");
cublastest(cublasGetMatrix(rows, cols, 4, d_out, rows, h_out, rows), "printMatrix03: ");
for(unsigned int i=0; i < 1024; i++) {
if(i%r == 0) std::cout << std::endl;
std::cout << i << ":" << h_out[i] << " ";;
}
cudaDeviceSynchronize();
cudatest(cudaPeekAtLastError());
cublasDestroy(handle);
delete h_out;
cudaFree(d_out);
}
this works fine!
The cublasSgeam call with CUBLAS_OP_T:
void printBlDMatrix(const float *matrix, const int rows, const int cols)
{
cudaDeviceSynchronize();
cudatest(cudaPeekAtLastError());
float *h_out, *d_out;
const float a=1, b=0;
h_out = new float[rows*cols];
cudaMalloc(&d_out, sizeof(float) * rows*cols);
cublasHandle_t handle;
cublasCreate(&handle);
cudaDeviceSynchronize();
cudatest(cudaPeekAtLastError(), "printMatrix00: ");
cublastest(cublasSgeam(handle,
CUBLAS_OP_T, CUBLAS_OP_N,
cols, rows,
&a,
matrix, rows,
&b,
m, rows,
d_out, cols), "printMatrix01: ");
cudaDeviceSynchronize();
cudatest(cudaPeekAtLastError(), "printMatrix02: ");
cublastest(cublasGetMatrix(cols, rows, 4, d_out, cols, h_out, cols), "printMatrix03: ");
for(unsigned int i=0; i< 1024; i++) {
if(i%r == 0) std::cout << std::endl;
std::cout << i << ":" << h_out[i] << " ";
}
cudaDeviceSynchronize();
cudatest(cudaPeekAtLastError());
cublasDestroy(handle);
delete h_out;
cudaFree(d_out);
}
[/i] gives me the printMatrix01: CUBLAS_STATUS_INVALID_VALUE,
MATRIX-OUTPUT,
** On entry to SGEAM parameter number 6 had an illegal value and a headache!
I dont care whether the call to cublasGetMatrix() ist correct or not, since the error occours earlier, but i have tried all permutations with rows & cols, e.g.:
cublasSgeam(handle,
CUBLAS_OP_T, CUBLAS_OP_N,
cols, rows,
&a,
matrix, rows,
&b,
m, rows,
d_out, rows), "printMatrix01: ");
printMatrix01: CUBLAS_STATUS_INVALID_VALUE,
MATRIX-OUTPUT,
** On entry to SGEAM parameter number 6 had an illegal value
or
cublasSgeam(handle,
CUBLAS_OP_T, CUBLAS_OP_N,
rows, cols,
&a,
matrix, rows,
&b,
m, rows,
d_out, rows), "printMatrix01: ");
printMatrix01: CUBLAS_STATUS_INVALID_VALUE,
MATRIX-OUTPUT,
** On entry to SGEAM parameter number 5 had an illegal value
As you can see in the output of the last example the “illegal” value changes its possition, as is i permuate the rows&cols arguments. One thing that drives me crazy is, that i dont know, if its beeing counted from 0 or 1; is the matrix the illegal argument?
Could you please help me! I´d like to know, if i just overlook sth., or if this is a bug, so i´m not crazy :)
I´m using VisualStudio 10 Premium on a Windows7 x64 and CudaToolKit 5.0.
Thanks in anticipation,
hanneshansen