I have a project where I’ve extracted the code that is to be run on the GPU to a DLL, and dynamically link this DLL to the main project. To facilitate, I’ve written a small sample EXE for testing, which I can build to use the DLL or statically link the GPU library. The DLL is written/compiled in C, the EXE in C++.
The GPU code runs when compiled statically with the sample EXE as either 32-bit or 64-bit, and dynamically linked when compiled 32-bit.
HOWEVER, I get a runtime error and everything fails when dynamically linked as 64-bit. Here is the error:
The accelerator does not match the profile for which this program was compiled
Current file: C:\Users\Sentry360\Desktop\360API.GCC\ImageProcessorGPU.c
function: InnerProcessImageGPU
line: 167
Current region was compiled for:
NVIDIA Tesla GPU sm10 sm20 sm30
Available accelerators:
device[1]: Native X86 (CURRENT DEVICE)
It seems during initialization my device 0 is not being found. The strange part is pgaccelinfo seems to run fine, and the other 3 scenarios run fine as well.
CUDA Driver Version: 6050
Device Number: 0
Device Name: GeForce GTX 560
Device Revision Number: 2.1
Global Memory Size: 1073741824
Number of Multiprocessors: 7
Number of Cores: 224
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 49152
Registers per Block: 32768
Warp Size: 32
Maximum Threads per Block: 1024
Maximum Block Dimensions: 1024, 1024, 64
Maximum Grid Dimensions: 65535 x 65535 x 65535
Maximum Memory Pitch: 2147483647B
Texture Alignment: 512B
Clock Rate: 1701 MHz
Execution Timeout: Yes
Integrated Device: No
Can Map Host Memory: Yes
Compute Mode: default
Concurrent Kernels: Yes
ECC Enabled: No
Memory Clock Rate: 2052 MHz
Memory Bus Width: 256 bits
L2 Cache Size: 524288 bytes
Max Threads Per SMP: 1536
Async Engines: 1
Unified Addressing: Yes
Current free memory: 818917376
Upload time (4MB): 1040 microseconds ( 690 ms pinned)
Download time: 1320 microseconds ( 690 ms pinned)
Upload bandwidth: 4032 MB/sec (6078 MB/sec pinned)
Download bandwidth: 3177 MB/sec (6078 MB/sec pinned)
PGI Compiler Option: -ta=tesla:cc20
clGetDeviceIDs returns code -1
Any ideas?