Hello,
I installed pgcc 14.7 and can compile OpenACC samples with no error. -ta=“tesla,cc1x”. When I run the binary, I got
call to cuModuleLoadData returned error 209: No binary for GPU
pgaccelinfo shows my gpu gforce 9800 and the output is:
CUDA Driver Version: 6050
NVRM version: NVIDIA UNIX x86_64 Kernel Module 340.24 Wed Jul 2 14:24:20 PDT 2014Device Number: 0
Device Name: GeForce 9800 GT
Device Revision Number: 1.1
Global Memory Size: 1073414144
Number of Multiprocessors: 14
Number of Cores: 112
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 16384
Registers per Block: 8192
Warp Size: 32
Maximum Threads per Block: 512
Maximum Block Dimensions: 512, 512, 64
Maximum Grid Dimensions: 65535 x 65535 x 1
Maximum Memory Pitch: 2147483647B
Texture Alignment: 256B
Clock Rate: 1500 MHz
Execution Timeout: No
Integrated Device: No
Can Map Host Memory: Yes
Compute Mode: default
Concurrent Kernels: No
ECC Enabled: No
Memory Clock Rate: 900 MHz
Memory Bus Width: 256 bits
Max Threads Per SMP: 768
Async Engines: 1
Unified Addressing: No
Initialization time: 412790 microseconds
Current free memory: 1034262272
Upload time (4MB): 1761 microseconds (1497 ms pinned)
Download time: 1854 microseconds (1243 ms pinned)
Upload bandwidth: 2381 MB/sec (2801 MB/sec pinned)
Download bandwidth: 2262 MB/sec (3374 MB/sec pinned)
PGI Compiler Option: -ta=tesla:cc11
I experiment with a sample found on PGI Compilers with OpenACC | PGI. https://developer.nvidia.com/content/cudacasts-episode-3
The code I downloaded from NVIDIA is:
/*
* Copyright 2012 NVIDIA Corporation
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include <math.h>
#include <string.h>
#include "timer.h"
#define NN 4096
#define NM 4096
double A[NN][NM];
double Anew[NN][NM];
int main(int argc, char** argv)
{
const int n = NN;
const int m = NM;
const int iter_max = 1000;
const double tol = 1.0e-6;
double error = 1.0;
memset(A, 0, n * m * sizeof(double));
memset(Anew, 0, n * m * sizeof(double));
for (int j = 0; j < n; j++)
{
A[j][0] = 1.0;
Anew[j][0] = 1.0;
}
printf("Jacobi relaxation Calculation: %d x %d mesh\n", n, m);
StartTimer();
int iter = 0;
#pragma acc data copy(A), create(Anew)
while ( error > tol && iter < iter_max )
{
error = 0.0;
#pragma omp parallel for shared(m, n, Anew, A)
#pragma acc kernels
for( int j = 1; j < n-1; j++)
{
for( int i = 1; i < m-1; i++ )
{
Anew[j][i] = 0.25 * ( A[j][i+1] + A[j][i-1]
+ A[j-1][i] + A[j+1][i]);
error = fmax( error, fabs(Anew[j][i] - A[j][i]));
}
}
#pragma omp parallel for shared(m, n, Anew, A)
#pragma acc kernels
for( int j = 1; j < n-1; j++)
{
for( int i = 1; i < m-1; i++ )
{
A[j][i] = Anew[j][i];
}
}
if(iter % 100 == 0) printf("%5d, %0.6f\n", iter, error);
iter++;
}
double runtime = GetTimer();
printf(" total: %f s\n", runtime / 1000);
}
-Minfo=accel shows the follows:
50, Generating copy(A[:][:])
Generating create(Anew[:][:])
56, Generating Tesla code
57, Loop is parallelizable
59, Loop is parallelizable
Accelerator kernel generated
57, #pragma acc loop gang /* blockIdx.y /
59, #pragma acc loop gang, vector(128) / blockIdx.x threadIdx.x /
63, Max reduction generated for error
68, Generating Tesla code
69, Loop is parallelizable
71, Loop is parallelizable
Accelerator kernel generated
69, #pragma acc loop gang / blockIdx.y /
71, #pragma acc loop gang, vector(128) / blockIdx.x threadIdx.x */
After searching the forum, most cuModuleLoadData error is 300 which is caused by compute capability mismatch. This is not my case.
I also installed cuda sdk from nvidia, and samples included in the sdk run correctly.
Did I miss any step in setting up pgi compiler.
Thanks,
Xing Fu