Multiple arch and -maxrregcount

Hi,

The Fermi compatibility guide mentions that you can create more than one version of your kernel, per arcitecture, and CUDART will decide which kernel

to run at runtime.

-gencode=arch=compute_10,code=sm_10 

-gencode=arch=compute_10,code=compute_10 

-gencode=arch=compute_20,code=sm_20 

-gencode=arch=compute_20,code=compute_20

If i have a mixed Tesla and Fermi cluster this is a great feature. However I have a kernel that I had to limit the reg count on Tesla with the -maxrregcount

flag but I dont want this to effect the sm_20 code for Fermi (and the -maxrregcount is in the makefile for both Tesla and Fermi)

How do I do this?

thanks

eyal

Did you try using the launch_bounds qualifier? I found it awkward and difficult to do what I wanted to do but maybe it was just me.

Did you try using the launch_bounds qualifier? I found it awkward and difficult to do what I wanted to do but maybe it was just me.

I’ll check it tomorrow :) I guess maybe the maxrregcount would work for sm < 2.0 and maybe the launch_bounds will override it

for sm == 2.0.

Can you please post how to use it? I tried it but it didnt compile and I couldnt find any reference to it in the SDK or the documents.

thanks

eyal

I’ll check it tomorrow :) I guess maybe the maxrregcount would work for sm < 2.0 and maybe the launch_bounds will override it

for sm == 2.0.

Can you please post how to use it? I tried it but it didnt compile and I couldnt find any reference to it in the SDK or the documents.

thanks

eyal

It’s section B.14 of the CUDA 3.0 Programming Guide.

It’s section B.14 of the CUDA 3.0 Programming Guide.