Cublas sgemm pointer error? Query re error in output of matrix multiplication.

Hi all,

I’ve written a simple matrix multiplication task using sgemm for profiling purposes. However it is sometimes returning incorrect results that look like pointers to me.

Its being run on a GTX260 with an Intel Core 2 (64bit) host.

Has anyone seen this before? Do you know why? What can I do about this?


// g+±4.3 multiply_matrices_gpu.c -I/usr/local/cuda/include -I/home/dhjones/cuda_sdk/common/inc -L/usr/local/cuda/lib64 -lcublas

#include <stdlib.h>

#include <stdio.h>

#include <cublas.h>

int main () {

int i,j;



float A = (float)malloc(jjsizeof(float));

        float *B = (float*)malloc(j*j*sizeof(float));

        float *C = (float*)malloc(j*j*sizeof(float));

if(A == NULL || B == NULL || C == NULL) return 1;

for (i=0;i<j*j;i++){ A[i] = 0; B[i] = 0; C[i] = 0; }

float* AA; float* BB; float* CC;








        int sum=0;

        for (i=0;i<j*j;i++){

            sum += C[i];


        printf("Size: %d.  Sum of elements: %d\n",j,sum);

        free( A );  free( B );  free ( C );

        cublasFree(AA);  cublasFree(BB); cublasFree(CC);


return 0;





Size: 2620. Sum of elements: 0

Size: 2820. Sum of elements: 0

Size: 3020. Sum of elements: 0

Size: 3220. Sum of elements: 0

Size: 3420. Sum of elements: 0

Size: 3620. Sum of elements: -2147483648

Size: 3820. Sum of elements: -2147483648

Size: 4020. Sum of elements: -2147483648

Size: 4220. Sum of elements: -2146263808


Note that repeated runs produce similar results, however not always starting at the same size matrix.

As a first step, add some error checking. All of the CUBLAS functions you are calling except sgemm() return a status, and the sgemm() result can be checked via cublasGetError().

Thanks. I added error checking. sgemm() was reporting error code 11: ‘access to GPU memory space failed’ for ALL but the first test (including the tests that ran fine).

I found your previous post re watchdog errors, however don’t think this applicable because

(1) the runtime for each sgemm() call is less than 1/2 a second

(2) I tried with a disabled xserver and it didn’t make a difference.


Best regards, David

I see a few bugs in your code:

  1. cublasSgemm takes pointers to matrices on the GPU (i.e. AA, BB and CC in your code) but you pass it A, B and C (host memory, which should cause an error that would be picked up by cublasGetError).

  2. You don’t initialize the matrix CC on the GPU. You allocate the space but don’t put any values there, so you just have junk stored in CC on the GPU. Then you add 1.0f * AA * BB to CC (junk), giving you an undefined output. This is after you’ve sorted out point 1 of course. You need to add cublasSetMatrix(j,j,sizeof(float),C,j,CC,j).

You are passing the wrong pointers to sgemm(). Here


A,B and C are host pointers. You have to pass AA,BB,CC, the device pointers you allocated.

Thanks all. It now works fine.

Best regards,