Incorrect results from cublasSgemm

zein · June 15, 2007, 9:08am

I have been playing around with cublas and I have ran into a strange problem.

First I initialize A and B so that {A,B}[i]=i+1
When I call cublasSgemm as:
cublasSgemm(‘t’, ‘n’, 65536, 1, 1, 1.0, A, 1, B, 1, 0.0, Y, 65536);
I get a correct resault.

but if I increase the dimensions of A by 1 as:
cublasSgemm(‘t’, ‘n’, 65537, 1, 1, 1.0, mat1_d, 1, mat2_d, 1, 0.0, res_d, 65537);
Then I get wrong results:
Y[65536] =1 instead of 65537

I can increase the dimension of A further and always the results Y[i] where i>65535 are wrong.

it looks like a short int overflow problem somewhere!
Can anyone tell me whats going on?

Mark_Harris · June 19, 2007, 9:09am

Hi Zein,

We’ve been unable to reproduce this locally, so it’s possible it is fixed already in our internal codebase. If you are a registered developer, you have access to CUDA 0.9 and can try with that. If not, it won’t be long before a new public release will be out.

Thanks,
Mark

zein · June 24, 2007, 3:44am

Thanks,
I can confirm that upgrading to CUDA 0.9 has solved this problem

pcrs · June 23, 2009, 10:14pm

I think I have a problem as well. I multiply two matrices V=10,6 H=10,5

The multiplication done is w+=V.transpose * H

The result should be shaped: 6,5

The results for the first 5 rows is correct, but the last row is off. If I change the dimensions, I get even bigger problems. Am I doing something wrong, or is this a bug?

I use cuda 2.1 and did store matrices in column-major order

-Peter

[codebox]

#include <stdlib.h>

#include <stdio.h>

#include <string.h>

#include <math.h>

#include “cutil.h”

#include <cublas.h>

#include “cutil_inline.h”

int

main(int argc, char** argv){

int device;

struct cudaDeviceProp properties;

if( cutCheckCmdLineFlag(argc, (const char**)argv, “device”) )

cutilDeviceInit(argc, argv);

else

cudaSetDevice( cutGetMaxGflopsDeviceId() );

cutilSafeCall(cudaGetDevice(&device));

cutilSafeCall(cudaGetDeviceProperties(&properties, device));

cublasStatus stat;

cublasInit();

int s1=6;int s2=5;int m1=0;int T=10;

unsigned int mem_size_w = sizeof(float)s1s2*(m1+1);

float* d_H;

float* d_V;

float* d_w;

size_t d_Hp;

size_t d_Vp;

size_t d_wp;

float* h_w = (float*)malloc(mem_size_w);

float* h_H = (float*)malloc(T*sizeof(float)*s2);

float* h_V = (float*)malloc(T*sizeof(float)*s1);

size_t h_wp=s2*sizeof(float);

size_t h_Hp=T*sizeof(float);

size_t h_Vp=T*sizeof(float);

cutilSafeCall(cudaMallocPitch((void**) &d_H, &d_Hp, T*sizeof(float), s2));

cutilSafeCall(cudaMallocPitch((void**) &d_V, &d_Vp, T*sizeof(float), s1));

cutilSafeCall(cudaMallocPitch((void**) &d_w, &d_wp, s2sizeof(float), s1(m1+1)));

for (int i=0;i<T;i++) {

   for (int k=0;k<s1;k++) {

      h_V[k*h_Vp/sizeof(float)+i]=k*h_Vp/sizeof(float)+i;

   }

   for (int k=0;k<s2;k++) {

      h_H[k*h_Hp/sizeof(float)+i]=-float((k*h_Hp/sizeof(float)+i));

   }

}

cutilSafeCall(cudaMemcpy2D(d_H, d_Hp, h_H, h_Hp, T*sizeof(float), s2, cudaMemcpyHostToDevice));

cutilSafeCall(cudaMemcpy2D(d_V, d_Vp, h_V, h_Vp, T*sizeof(float), s1, cudaMemcpyHostToDevice));

cudaMemset(d_w, 0, s1d_wp(m1+1));

cublasSgemm(‘t’, ‘n’, s1, s2, T, 1.0f, d_V, d_Vp/sizeof(float), d_H, d_Hp/sizeof(float), 1.0f, d_w, d_wp/sizeof(float));

cutilSafeCall(cudaMemcpy2D(h_w, h_wp, d_w, d_wp, s2*sizeof(float), (m1+1)*s1, cudaMemcpyDeviceToHost));

for (int m=0;m<m1+1;m++){

   for (int i=0;i<s1;i++){

      for (int k=0;k<s2;k++) {

        if (k<s2-1) {printf("%3.8f,", h_w[m*s1*s2+i*s2+k]);} else {printf("%3.8f],\n[", h_w[m*s1*s2+i*s2+k]);}}

   }

}

free(h_w);free(h_H);free(h_V);

cublasFree (d_w);cublasFree (d_H);cublasFree (d_V);

}

[/codebox]

Nico · June 24, 2009, 7:27am

Pitch returns the width of the array in bytes.
Try cudaMemset(d_w, 0, s1d_wp(m1+1)*sizeof(float));

N.

pcrs · June 24, 2009, 10:59am

Doesn’t cudaMemset also have the length indicated in bytes?

Nico · June 24, 2009, 11:10am

Whoops, you’re right of course. forgot that d_wp is in fact the pitch returned by the allocation. I was under the impression that it was being recalculated. My mistake :)

N.

pcrs · June 25, 2009, 9:24pm

I have probably made an error somewhere, since my results are not consistent. If I can pinpoint the problem, I will write more about it.
…
yep, I made an error in the pitch of the back copy. It’s all working fine now. Sorry for bothering you.

Topic		Replies	Views
cublasDgemm returns wrong results for large matrix dimensions? CUDA Programming and Performance	12	3275	November 30, 2010
cublas - cublasSgemm - problem CUDA Programming and Performance	2	2146	March 16, 2010
cublasSgemm wrong return value CUDA Programming and Performance	15	6324	November 12, 2010
Matlab mex file using cublas - problems CUDA Programming and Performance	0	2384	June 11, 2008
cublasSgemm gives incorrect result with big matrix CUDA Programming and Performance cuda	1	455	June 28, 2020
beginner CUBLAS Sgemm question CUDA Programming and Performance	2	1714	March 9, 2010
cgemm operation returns wrong result Error in C Code? CUDA Programming and Performance	8	1756	August 25, 2009
Matrix Multiplication by cublasSgemm CUDA Programming and Performance	1	7536	March 26, 2010
cublas matrix-vector problem CUDA Programming and Performance	1	3083	May 15, 2009
Weird Behavior from cublasGetMatrix CUDA Programming and Performance	0	1579	June 11, 2008

Incorrect results from cublasSgemm

Related topics