Hello,
- Using the instructions outlined on
I managed to install CUDA 4.0 on my desktop running Fedora 16 (64-bit). The configuration of my desktop is: Intel Core i7 920 CPU, 6 GB DDR3 SDRAM, GeForce GTX 295 graphics card.
- I am using gcc 4.6.2 (which I know is not compatible with CUDA, but it seems the samples do compile and execute).
$ gcc --version
gcc (GCC) 4.6.2 20111027 (Red Hat 4.6.2-1)
Copyright (C) 2011 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
- Output from deviceQuery
$ ./deviceQuery
[deviceQuery] starting...
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Found 2 CUDA Capable device(s)
Device 0: "GeForce GTX 295"
CUDA Driver Version / Runtime Version 4.10 / 4.0
CUDA Capability Major/Minor version number: 1.3
Total amount of global memory: 896 MBytes (939327488 bytes)
(30) Multiprocessors x ( 8) CUDA Cores/MP: 240 CUDA Cores
GPU Clock Speed: 1.24 GHz
Memory Clock rate: 999.00 Mhz
Memory Bus Width: 448-bit
Max Texture Dimension Size (x,y,z) 1D=(8192), 2D=(65536,32768), 3D=(2048,2048,2048)
Max Layered Texture Size (dim) x layers 1D=(8192) x 512, 2D=(8192,8192) x 512
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 16384
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 256 bytes
Concurrent copy and execution: Yes with 1 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Concurrent kernel execution: No
Alignment requirement for Surfaces: Yes
Device has ECC support enabled: No
Device is using TCC driver mode: No
Device supports Unified Addressing (UVA): No
Device PCI Bus ID / PCI location ID: 4 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
Device 1: "GeForce GTX 295"
CUDA Driver Version / Runtime Version 4.10 / 4.0
CUDA Capability Major/Minor version number: 1.3
Total amount of global memory: 895 MBytes (938803200 bytes)
(30) Multiprocessors x ( 8) CUDA Cores/MP: 240 CUDA Cores
GPU Clock Speed: 1.24 GHz
Memory Clock rate: 999.00 Mhz
Memory Bus Width: 448-bit
Max Texture Dimension Size (x,y,z) 1D=(8192), 2D=(65536,32768), 3D=(2048,2048,2048)
Max Layered Texture Size (dim) x layers 1D=(8192) x 512, 2D=(8192,8192) x 512
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 16384
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 256 bytes
Concurrent copy and execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Concurrent kernel execution: No
Alignment requirement for Surfaces: Yes
Device has ECC support enabled: No
Device is using TCC driver mode: No
Device supports Unified Addressing (UVA): No
Device PCI Bus ID / PCI location ID: 5 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 4.10, CUDA Runtime Version = 4.0, NumDevs = 2, Device = GeForce GTX 295, Device = GeForce GTX 295
[deviceQuery] test results...
PASSED
Press ENTER to exit...
The Mandlebrot example works too.
- But when I execute a simple program from CUDA By Example, to test the system,
$ more hello.cu
#include "./common/book.h"
int main ( void ){
printf( "Hello world!\n" );
return 0;
}
I get a “permission denied error” when I execute it:
$ nvcc -c hello.cu -o hello.out
$
$ ./hello.out
bash: ./hello.out: Permission denied
$
Note that I have to use the -c option (Why?). Otherwise, I get a string of errors
$ nvcc hello.cu -o hello.out
/usr/bin/ld: /tmp/tmpxft_00000df8_00000000-13_hello.o: undefined reference to symbol 'pthread_cancel@@GLIBC_2.2.5'
/usr/bin/ld: note: 'pthread_cancel@@GLIBC_2.2.5' is defined in DSO /lib64/libpthread.so.0 so try adding it to the linker command line
/lib64/libpthread.so.0: could not read symbols: Invalid operation
collect2: ld returned 1 exit status
I would appreciate responses to the following questions:
-
Why am I getting a “permission denied” error when I execute the binary file?
-
Why do I have to use a -c flag? Couldn’t figure out the reason from http://sbel.wisc.edu/Courses/ME964/2008/Documents/nvccCompilerInfo.pdf.
-
How do I fix this without downgrading to gcc 4.4?
Thanks in advance!