call to cuModuleLoadData returned error 209

ajooya16478 · September 18, 2015, 11:59pm

Hello,

I have just started using NVIDIA’s OpenAcc toolkit2015. I have two GPUs on my system; “Quadro K620” and “Tesla k20c”.
I managed to compile and run a simple example (https://www.youtube.com/watch?v=_do2Dwa29EM) on “Quadro K620”. But when I target tesla, I get this error:

pgcc -acc -ta=tesla:cc35 -o laplas2d-acc laplace2d.c
call to cuModuleLoadData returned error 209: No binary for GPU

I believe pgi supports NVIDIA’s tesla GPUs.
I appreciate any help.

Thanks,
Ali

MatColgrove · September 21, 2015, 5:35pm

Hi Ali,

This binary should be fine for the K20 since K20s are compute capability 3.5, but the Quadro K620 is compute capability 5.0 so you’ll either need to change “-ta=tesla:cc35” to “-ta=tesla:cc50”, or simply remove the “cc” sub-option. Without the “cc” sub-option, we create multiple versions of the device code for a variety of compute capabilities. Using a specific “cc” sub-option only creates a single target binary.

Hope this helps,
Mat

ajooya16478 · September 21, 2015, 5:56pm

Thanks Mat.

I added -ta=tesla:cc35 to target Tesla K20c. I tried cc50 before without any problem and run time was 26 sec. If I remove cc option, I get the same run time. So I assume without cc, it will target Quadro K620. How can I run it only on K20c?

Here is “pgaccelinfo” output:

CUDA Driver Version: 7050
NVRM version: NVIDIA UNIX x86_64 Kernel Module 352.07 Fri May 8 17:48:57 PDT 2015

Device Number: 0
Device Name: Tesla K20c
Device Revision Number: 3.5
Global Memory Size: 5032706048
Number of Multiprocessors: 13
Number of SP Cores: 2496
Number of DP Cores: 832
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 49152
Registers per Block: 65536
Warp Size: 32
Maximum Threads per Block: 1024
Maximum Block Dimensions: 1024, 1024, 64
Maximum Grid Dimensions: 2147483647 x 65535 x 65535
Maximum Memory Pitch: 2147483647B
Texture Alignment: 512B
Clock Rate: 705 MHz
Execution Timeout: No
Integrated Device: No
Can Map Host Memory: Yes
Compute Mode: default
Concurrent Kernels: Yes
ECC Enabled: Yes
Memory Clock Rate: 2600 MHz
Memory Bus Width: 320 bits
L2 Cache Size: 1310720 bytes
Max Threads Per SMP: 2048
Async Engines: 2
Unified Addressing: Yes
Managed Memory: Yes
PGI Compiler Option: -ta=tesla:cc35

Device Number: 1
Device Name: Quadro K620
Device Revision Number: 5.0
Global Memory Size: 2146762752
Number of Multiprocessors: 3
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 49152
Registers per Block: 65536
Warp Size: 32
Maximum Threads per Block: 1024
Maximum Block Dimensions: 1024, 1024, 64
Maximum Grid Dimensions: 2147483647 x 65535 x 65535
Maximum Memory Pitch: 2147483647B
Texture Alignment: 512B
Clock Rate: 1124 MHz
Execution Timeout: Yes
Integrated Device: No
Can Map Host Memory: Yes
Compute Mode: default
Concurrent Kernels: Yes
ECC Enabled: No
Memory Clock Rate: 900 MHz
Memory Bus Width: 128 bits
L2 Cache Size: 2097152 bytes
Max Threads Per SMP: 2048
Async Engines: 1
Unified Addressing: Yes
Managed Memory: Yes
PGI Compiler Option: -ta=tesla:cc50

MatColgrove · September 21, 2015, 7:44pm

So I assume without cc, it will target Quadro K620.

Not quite. Without “cc”, multiple device binaries will be embedded in your executable with the decision on which binary to run determined when you first execute.

How can I run it only on K20c?

Sorry, but I’m not quite understanding the question. Do you want it to only run on the K20 or are you ask why it’s currently running on the K620?

To exclusively use the K20, you can set the environment variable “ACC_DEVICE_NUM=0” or call the routine “acc_set_device_num” from your program.

By default, device 0 would be used, so if you’re running on the K620, then you must be setting the device number to 1 someplace.

Mat

ajooya16478 · September 21, 2015, 9:23pm

I wanted to run the parallel loop exclusively on K20c and “ACC_DEVICE_NUM=0” did the trick.

Thanks

Topic		Replies	Views
cuModuleLoadData error 209 Legacy PGI Compilers	7	16309	February 10, 2015
execution error using -ta=tesla:cc35 on K80 Legacy PGI Compilers	1	4362	March 9, 2017
error 209 Legacy PGI Compilers	2	2895	December 28, 2016
call to cuModuleLoadData returned error 209: No binary GPU Legacy PGI Compilers	2	4364	August 22, 2016
No Binary for GPU on P100 when running executable Legacy PGI Compilers	1	3472	January 14, 2017
'No binary for GPU' error uccured from CUDACAST#3 sample Legacy PGI Compilers	2	3143	July 29, 2015
PGI Accelerator on NVIDIA S1070 and S2050 Fermi Legacy PGI Compilers	3	3627	February 1, 2011
ACC_DEVICE_NUM=1 Legacy PGI Compilers	2	9745	July 15, 2009
Run error when trying to use 980 Ti Legacy PGI Compilers	1	2181	October 31, 2016
pgaccelinfo: FAILED to create/destroy device context error? Legacy PGI Compilers	4	18258	October 23, 2009

call to cuModuleLoadData returned error 209

Related topics