Hi, all
I’m running an iterative linear solver on GPU. It works fine on Tesla C1060 but fails on GTX 275. (I have post a topic for it earlier). Now it seems that I reach where the problem is …
I wrote a sample code to test Cublas
[codebox]#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include <time.h>
#include <math.h>
#include <cublas.h>
#include <cutil.h>
#include <cuda.h>
#define REAL double
int main(int argc, char *argv)
{
CUT_DEVICE_INIT(argc, argv);
cublasStatus status;
status = cublasInit();
if( status != CUBLAS_STATUS_SUCCESS)
{
fprintf(stderr, "Fatal Error: CUBLAS init failed.\n");
return(-1);
}
int i,j,n;
n = 100000;
REAL *x = (REAL ) malloc(nsizeof(REAL));
REAL *y = (REAL ) malloc(nsizeof(REAL));
srand (100);
for (i=0; i<n; i++)
{
x[i] = rand() / (RAND_MAX + 1.0);
y[i] = rand() / (RAND_MAX + 1.0);
}
REAL *d_x, *d_y;
CUDA_SAFE_CALL(cudaMalloc((void **)&d_x, (size_t)n*sizeof(REAL)));
CUDA_SAFE_CALL(cudaMalloc((void **)&d_y, (size_t)n*sizeof(REAL)));
CUDA_SAFE_CALL(cudaMemcpy(d_x, x, (size_t)n*sizeof(REAL), cudaMemcpyHostToDevice));
CUDA_SAFE_CALL(cudaMemcpy(d_y, y, (size_t)n*sizeof(REAL), cudaMemcpyHostToDevice));
for (i=0; i<100; i++)
cublasDaxpy(n, 1.0, d_x, 1, d_y, 1);
REAL t = cublasDdot(n, d_y, 1, d_y, 1);
REAL t2 = 0.0;
for (j=0; j<100; j++)
for (i=0; i<n; i++)
y[i] += x[i];
for (i=0; i<n; i++)
t2 += y[i]*y[i];
printf(“GPU = %lf, CPU = %lf\n”, t, t2);
free(x);
free(y);
CUDA_SAFE_CALL(cudaFree(d_x));
CUDA_SAFE_CALL(cudaFree(d_y));
status = cublasShutdown();
if( status != CUBLAS_STATUS_SUCCESS)
{
fprintf(stderr, "Fatal Error: CUBLAS shutdown failed.\n");
return(-1);
}
}
[/codebox]
As you see, in this code I do nothing but 100 SAXPY and one dot product on both CPU and GPU sides. Surprisingly, the output of GPU is not consistent for each run. Here is some output
[codebox] [~/GPU/testcublas] % ./test
Using device 0: GeForce GTX 275
GPU = 338244749.135282, CPU = 339742608.816349
[~/GPU/testcublas] % ./test
Using device 0: GeForce GTX 275
GPU = 338080902.402454, CPU = 339742608.816349
[~/GPU/testcublas] % ./test
Using device 0: GeForce GTX 275
GPU = 337694204.183996, CPU = 339742608.816349
[/codebox]
The result by GPU varies for each run.
I am working on 64-bit linux workstation located in MSI(Minnesota Supercomputing Center). The GPU card is GTX 275. The cuda they installed is cuda-2.0. compiled by icc
The code is attached.
Thanks a lot! :">
test.cu (1.51 KB)