Hi,
The Fermi compatibility guide mentions that you can create more than one version of your kernel, per arcitecture, and CUDART will decide which kernel
to run at runtime.
-gencode=arch=compute_10,code=sm_10
-gencode=arch=compute_10,code=compute_10
-gencode=arch=compute_20,code=sm_20
-gencode=arch=compute_20,code=compute_20
If i have a mixed Tesla and Fermi cluster this is a great feature. However I have a kernel that I had to limit the reg count on Tesla with the -maxrregcount
flag but I dont want this to effect the sm_20 code for Fermi (and the -maxrregcount is in the makefile for both Tesla and Fermi)
How do I do this?
thanks
eyal
Did you try using the launch_bounds qualifier? I found it awkward and difficult to do what I wanted to do but maybe it was just me.
Did you try using the launch_bounds qualifier? I found it awkward and difficult to do what I wanted to do but maybe it was just me.
I’ll check it tomorrow :) I guess maybe the maxrregcount would work for sm < 2.0 and maybe the launch_bounds will override it
for sm == 2.0.
Can you please post how to use it? I tried it but it didnt compile and I couldnt find any reference to it in the SDK or the documents.
thanks
eyal
I’ll check it tomorrow :) I guess maybe the maxrregcount would work for sm < 2.0 and maybe the launch_bounds will override it
for sm == 2.0.
Can you please post how to use it? I tried it but it didnt compile and I couldnt find any reference to it in the SDK or the documents.
thanks
eyal
It’s section B.14 of the CUDA 3.0 Programming Guide.
It’s section B.14 of the CUDA 3.0 Programming Guide.