SDK examples: -m32 on x86_64

Hi all,

I just got a little GPU to start tinkering with porting my application to CUDA, and I’m ran into some trouble building the SDK examples. Some build fine, but matrixMulDrv fails:

In file included from /usr/include/features.h:352,
from /usr/include/limits.h:27,
from /usr/lib/gcc/x86_64-redhat-linux/4.1.2/include/limits.h:122,
from /usr/lib/gcc/x86_64-redhat-linux/4.1.2/include/syslimits.h:7,
from /usr/lib/gcc/x86_64-redhat-linux/4.1.2/include/limits.h:11,
from /usr/local/cuda/bin/…/include/driver_types.h:48,
from /usr/local/cuda/bin/…/include/builtin_types.h:43,
from /usr/local/cuda/bin/…/include/cuda_runtime.h:45,
from :1:
/usr/include/gnu/stubs.h:7:27: error: gnu/stubs-32.h: No such file or directory
make: *** [data/matrixMul_kernel.cubin] Error 255

The problem appears to be that x86_64 is not defined, which sets __WORDSIZE=32 and gives the error, because it should look for stubs-64.h. This in turn appears to be because common.mk hardcodes -m32 in the cubin compile lines. Changing this to -m64 allows the compile to continue, but this should probably be detected.

Cheers,

/Patrik

Thanks for pointing this out. An alternative workaround is to install the glibc-devel i386 RPM that ships with the OS.

Hi, I just wanted to point out that this is still an issue. I examined common/common.mk and there does appear to be some code to detect the architechture:

# detect if 32 bit or 64 bit system

HP_64 =	$(shell uname -m | grep 64)

... more code to set the actual flag

It didnt actually work for me and i had to manually set: CUBIN_ARCH_FLAG (at two locations). This is on Ubuntu 7.10.

Just wanted to point that out

I fixed it by moving the flag outside the unrelated OpenGL test.

Apologies for the silly syntax, just did a copy-paste job:

detect if 32 bit or 64 bit system

HP_64 = $(shell uname -m | grep 64)

    ifeq "$(strip $(HP_64))" ""
    else
    CUBIN_ARCH_FLAG := -m64
    endif

Thanks for the solution to the build problem – worked for me on my x86_64.

However, running matrixMulDrv produces:

Processing time: 0.123000 (ms)

Test FAILED

Press ENTER to exit...

I tried changing the error threshold in matrixMulDrv.cpp from 1e-6 to 1e-5 and 1e-1, but got the same FAILED result.

All other programs in the SDK/bin/linux/release directory report Test PASSED.

Could this be related to the makefile change?

Thanks,

Scott

Update: I took a look at the two matrixMulDrv C matrices (result of C=A*B) – that is, the C computed by the GPU and by the host CPU. It looks like the CPU reports matrix entries 10-50 times greater than the GPU entries. Example: CPU=10.7613, GPU=0.8684.

Could there be a problem with my CUDA installation?

Thanks,
Scott