Hello, guys!
I’d like to ask you to help me with a problem I don’t exactly know.
I am running a computer with an Nvidia Quadro FX 570 graphics card. And I’d like to start with CUDA-programming. But it doesn’t work. I’ve installed the following packages:
and the driver for my card: “NVIDIA-Linux-x86_64-173.14.05-pkg2.run”.
I searched with google.com for a tutorial and found one (CUDA Tutorial | /// Parallel Panorama ///). But some of the code from this tutorial doesn’t run. The code from the first tutorial works fine. But from the secon does not. I modified to get some error-messages and informations about what doesn’t work:
// incrementArray.cu
#include <stdio.h>
#include <assert.h>
#include <cuda.h>
void incrementArrayOnHost(float* a, int N)
int i;
for(i = 0; i < N; i++)
a[i] = a[i] + 1.f;
__global__ void incrementArrayOnDevice(float* a, int N)
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if(idx < N)
a[idx] = a[idx] + 1;
int main(int argc, char* argv[])
float* a_h;
float* b_h; // pointers to host memory
float* a_d; // pointer to device memory
int i;
int N = 10;
size_t size = N * sizeof(float);
int blockSize = 1;
// allocate arrays on host
a_h = (float *)malloc(size);
b_h = (float *)malloc(size);
// allocate array on device
cudaMalloc((void **) &a_d, size);
// initialization of host data
for(i = 0; i < N; i++)
a_h[i] = (float)i;
for(i = 0; i < N; i++)
printf("a_h[%d]: %f\t", i, a_h[i]);
printf("b_h[%d]: %f\n", i, b_h[i]);
// copy data from host to device
cudaMemcpy(a_d, a_h, sizeof(float) * N, cudaMemcpyHostToDevice);
// do calculation on host
incrementArrayOnHost(a_h, N);
// do calculation on device:
// Part 1 of 2. Compute execution configuration
int nBlocks = N / blockSize + (N % blockSize == 0 ? 0 : 1);
// Part 2 of 2. Call incrementArrayOnDevice kernel
incrementArrayOnDevice <<< nBlocks, blockSize >>> (a_d, N);
printf("incrementArrayOnDevice(%s);\n", cudaGetErrorString(cudaGetLastError()));
// Retrieve result from device and store in b_h
cudaMemcpy(b_h, a_d, sizeof(float) * N, cudaMemcpyDeviceToHost);
for(i = 0; i < N; i++)
printf("a_h[%d]: %f\t", i, a_h[i]);
printf("b_h[%d]: %f\n", i, b_h[i]);
// check results
for(i = 0; i < N; i++)
assert(a_h[i] == b_h[i]);
// cleanup
And I got the following output:
user@Linux:~/Desktop/CUDA$ ./a.out
a_h[0]: 0.000000 b_h[0]: 0.000000
a_h[1]: 1.000000 b_h[1]: 0.000000
a_h[2]: 2.000000 b_h[2]: 0.000000
a_h[3]: 3.000000 b_h[3]: 0.000000
a_h[4]: 4.000000 b_h[4]: 0.000000
a_h[5]: 5.000000 b_h[5]: 0.000000
a_h[6]: 6.000000 b_h[6]: 0.000000
a_h[7]: 7.000000 b_h[7]: 0.000000
a_h[8]: 8.000000 b_h[8]: 0.000000
a_h[9]: 9.000000 b_h[9]: 0.000000
incrementArrayOnDevice(invalid device function );
a_h[0]: 1.000000 b_h[0]: 0.000000
a_h[1]: 2.000000 b_h[1]: 1.000000
a_h[2]: 3.000000 b_h[2]: 2.000000
a_h[3]: 4.000000 b_h[3]: 3.000000
a_h[4]: 5.000000 b_h[4]: 4.000000
a_h[5]: 6.000000 b_h[5]: 5.000000
a_h[6]: 7.000000 b_h[6]: 6.000000
a_h[7]: 8.000000 b_h[7]: 7.000000
a_h[8]: 9.000000 b_h[8]: 8.000000
a_h[9]: 10.000000 b_h[9]: 9.000000
a.out: incrementArrays.cu:82: int main(int, char**): Assertion `a_h[i] == b_h[i]' failed.
No compiling problems or anything. Then I tried to compile and run the examples from the SDK. And only some of them ran.
I show you some of the console outputs, so you might get an idea of what has gone wrong:
user@Linux:/opt/NVIDIA_CUDA_SDK/C/bin/linux/release$ ./3dfd
3DFD running on: Quadro FX 570
Total GPU Memory: 255.3125 MB
Unable to allocate 351.5625 Mbytes of GPU memory
user@Linux:/opt/NVIDIA_CUDA_SDK/C/bin/linux/release$ ./dct8x8
CUDA sample DCT/IDCT implementation
Loading test image: barbara.bmp... [512 x 512]... Success
Running Gold 1 (CPU) version... Success
Running Gold 2 (CPU) version... Success
cudaSafeCall() Runtime API error in file <dct8x8.cu>, line 195 : feature is not yet implemented.
Running CUDA 1 (GPU) version...
user@Linux:/opt/NVIDIA_CUDA_SDK/C/bin/linux/release$ ./Mandelbrot
[ CUDA Mandelbrot & Julia Set ]
Initializing GLUT...
Loading extensions: No error
OpenGL window created.
> Compute SM 1.1 Device Detected
> Device 0: <Quadro FX 570>
Data initialization done.
Starting GLUT main loop...
Press [s] to toggle between GPU and CPU implementations
Press [j] to toggle between Julia and Mandelbrot sets
Press [r] or [R] to decrease or increase red color channel
Press [g] or [G] to decrease or increase green color channel
Press [b] or [B] to decrease or increase blue color channel
Press [e] to reset
Press [a] or [A] to animate colors
Press [c] or [C] to change colors
Press [d] or [D] to increase or decrease the detail
Press [p] to record main parameters to file params.txt
Press [o] to read main parameters from file params.txt
Left mouse button + drag = move (Mandelbrot or Julia) or animate (Julia)
Press [m] to toggle between move and animate (Julia) for left mouse button
Middle mouse button + drag = Zoom
Right mouse button = Menu
Press [?] to print location and scale
Press [q] to exit
Creating GL texture...
Texture created.
Creating PBO...
cudaSafeCall() Runtime API error in file <Mandelbrot.cpp>, line 892 : feature is not yet implemented.
cudaSafeCall() Runtime API error in file <Mandelbrot.cpp>, line 468 : feature is not yet implemented.
And here the output from the deviceQuery:
user@Linux:/opt/NVIDIA_CUDA_SDK/C/bin/linux/release$ ./deviceQuery
CUDA Device Query (Runtime API) version (CUDART static linking)
There is 1 device supporting CUDA
Device 0: "Quadro FX 570"
CUDA Driver Version: 0.0
CUDA Runtime Version: 2.30
CUDA Capability Major revision number: 1
CUDA Capability Minor revision number: 1
Total amount of global memory: 267714560 bytes
Number of multiprocessors: 16
Number of cores: 128
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 0.92 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: No
Integrated: Yes
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads can use this device simultaneously)
Press ENTER to exit...
Do you have any idea, what I need to do to get this CUDA-thing working?
Could it be that my device-driver is too old and that I need to install a new one? If so, how do I uninstall the old one?
Could it be that the other hardware from my computer is causing the problem?
I’d be very happy to have a useful answer!
BTW: My OS is:
Linux Linux #1 SMP Tue Sep 25 20:41:25 BST 2007 x86_64 x86_64 x86_64 GNU/Linux
And the distribution is called Slamd64 12.0