cublasDgemm returns wrong results for large matrix dimensions?

Hi,

I have written a small test case that multiplies C=A’ * B using cublasDgemm. When the dimensions of A’ get large enough, entries of C hold incorrect values. Could someone please offer some help or advice?

Specifically, A is K x I, B is K x J, and C is I x J. My program sets entries of A to 0.2, entries of B to 1.0, so one would expect every entry of C to equal 0.2*K.
The results (indicating (I,J,K) and the outcome) are:

(10, 10, 8): OK.
(100, 100, 8): OK.
(300, 100, 8): OK.
(300, 1000, 8): OK.
(300, 10000, 8): Fail!
Difference at C position (0, 0)
Expected: 1.6
Found: 6.4

The program source and Makefile is attached here.

cublas-bug.tar.gz (6.59 KB)

My configuration is:

[dalexander@pearson cublas-bug]$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 260.19.12 Fri Oct 8 11:17:08 PDT 2010
GCC version: gcc version 4.4.3 20100127 (Red Hat 4.4.3-4) (GCC)

Installed CUDA 3.2 RC2
Fedora release 12 (Constantine)
2.6.32.11-99.fc12.x86_64 #1 SMP Mon Apr 5 19:59:38 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux

Devices:
0 : GeForce GTX 480
1 : GeForce GTX 480
2 : GeForce GTX 280

Regards,
Dave

Hi,

I have written a small test case that multiplies C=A’ * B using cublasDgemm. When the dimensions of A’ get large enough, entries of C hold incorrect values. Could someone please offer some help or advice?

Specifically, A is K x I, B is K x J, and C is I x J. My program sets entries of A to 0.2, entries of B to 1.0, so one would expect every entry of C to equal 0.2*K.
The results (indicating (I,J,K) and the outcome) are:

(10, 10, 8): OK.
(100, 100, 8): OK.
(300, 100, 8): OK.
(300, 1000, 8): OK.
(300, 10000, 8): Fail!
Difference at C position (0, 0)
Expected: 1.6
Found: 6.4

The program source and Makefile is attached here. [attachment=24609:cublas-bug.tar.gz]

My configuration is:

[dalexander@pearson cublas-bug]$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 260.19.12 Fri Oct 8 11:17:08 PDT 2010
GCC version: gcc version 4.4.3 20100127 (Red Hat 4.4.3-4) (GCC)

Installed CUDA 3.2 RC2
Fedora release 12 (Constantine)
2.6.32.11-99.fc12.x86_64 #1 SMP Mon Apr 5 19:59:38 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux

Devices:
0 : GeForce GTX 480
1 : GeForce GTX 480
2 : GeForce GTX 280

Regards,
Dave

With the final 3.2 on suse 11.1, your code worked fine for me on a GTX 260.

With the final 3.2 on suse 11.1, your code worked fine for me on a GTX 260.

I updated to the final 3.2, and it still fails on my GTX 480. eelsen: what is your gcc version?

I have added tests for SGEMM as well. SGEMM seems to work fine:

[dalexander@pearson cublas-bug]$ ./test32

(10, 10, 8): OK.

(100, 100, 8): OK.

(300, 100, 8): OK.

(300, 1000, 8): OK.

(300, 10000, 8): OK.

while DGEMM still fails:

[dalexander@pearson cublas-bug]$ ./test64

(10, 10, 8): OK.

(100, 100, 8): OK.

(300, 100, 8): OK.

(300, 1000, 8): OK.

(300, 10000, 8): Fail!

Difference at C position (0, 0)

Expected: 1.6

Found: 6.4

See http://bitbucket.org/dalexand/cublas-bug/downloads for the updated source code for the test case.

Dave

I updated to the final 3.2, and it still fails on my GTX 480. eelsen: what is your gcc version?

I have added tests for SGEMM as well. SGEMM seems to work fine:

[dalexander@pearson cublas-bug]$ ./test32

(10, 10, 8): OK.

(100, 100, 8): OK.

(300, 100, 8): OK.

(300, 1000, 8): OK.

(300, 10000, 8): OK.

while DGEMM still fails:

[dalexander@pearson cublas-bug]$ ./test64

(10, 10, 8): OK.

(100, 100, 8): OK.

(300, 100, 8): OK.

(300, 1000, 8): OK.

(300, 10000, 8): Fail!

Difference at C position (0, 0)

Expected: 1.6

Found: 6.4

See http://bitbucket.org/dalexand/cublas-bug/downloads for the updated source code for the test case.

Dave

Could be related to the driver (260.19). There was an earlier post that certain integer64 operations were not working properly with CUDA 3.2. It was traced to a possible driver problem - 260.19. Slightly Earlier driver version did not have. Check it out.

Could be related to the driver (260.19). There was an earlier post that certain integer64 operations were not working properly with CUDA 3.2. It was traced to a possible driver problem - 260.19. Slightly Earlier driver version did not have. Check it out.

Sarnath: please be more specific. If you can, please provide a link.

Sarnath: please be more specific. If you can, please provide a link.

I am satisfied that this is a bug in CUBLAS, perhaps having something to do with the configuration of my machine. Is there a more formal procedure for filing a bug?

Dave

I am satisfied that this is a bug in CUBLAS, perhaps having something to do with the configuration of my machine. Is there a more formal procedure for filing a bug?

Dave

Hi Dave,

Yes, you can file a bug if you have signed up for a registered developer account. See http://nvdeveloper.nvidia.com/

Thanks,

Cliff