sm_XX backward compatibility

mahmood.nt · June 4, 2018, 5:13pm

There are two GPU cards on the system: M2000 and C2075.
When I build lammps with sm_50, I can run the program on M2000. However, when I build with sm_21, neither M2000 not c2075 run the program. The error I get is

ERROR: GPU library not compiled for this accelerator (…/gpu_extra.h:40)
Last command: package gpu

Isn’t M2000 backward compatible?

Robert_Crovella · June 4, 2018, 5:21pm

I think I already pointed out that CUDA 9.x won’t work with C2075.

And depending on the exact compile command use to build for sm_21, it’s possible the binary will not run on M2000. In order to get this kind of forward compatibility, the built binary must contain PTX. If your compile command included only, for example:

-gencode arch=compute_20,code=sm_21

that binary would not run on M2000. It will only run on a cc2.1 device (and, by the way, c2075 is not a cc2.1 device, so that binary would also not run on c2075).

mahmood.nt · June 4, 2018, 5:29pm

bob,
I have installed cuda 8. I forgot to say that. I see a warning message that sm_21 is deprecated.
I will look at the compile command for what you said.

I searched a lot to find a match table for sm number and compute compatibility. I suppose that when CC is 5.0 then the sm is 50. Is that correct? I have seen a document for that!

Robert_Crovella · June 4, 2018, 5:34pm

Yes, a compute capability 5.0 GPU would normally use arch=compute_50,code=sm_50 binary

To embed PTX, you would need something like arch=compute_50,code=compute_50
which would embed cc5.0 PTX, which will then be forward compatible with GPUs of cc5.0 or higher.

Even that PTX will not run on a cc2.0 device.

If you were wanting to embed cc2.0 PTX, you should use:

-gencode arch=compute_20,code=compute_20

If you want to embed cc2.0 SASS, you should use:

-gencode arch=compute_20,code=sm_20

A given compile command can include multiple gencode switches for creation of a “fat” binary which includes support for multiple target types.

mahmood.nt · June 4, 2018, 5:53pm

(and, by the way, c2075 is not a cc2.1 device, so that binary would also not run on c2075).

According to the deviceQuery, it is CC 2.0 so the sm should be 20.

The definition in Makefile is

CUDA_ARCH=-arch=sm_XX

I set XX to 20. However, still c2075 is unable to run the program. Am I missing something?

Using CUDA_ARCH=–gencode arch=compute_20,code=compute_20 the following command fails during the compilation:

nvcc -I/usr/local/cuda/include -DUNIX -O3 -Xptxas -v --use_fast_math -DLAMMPS_SMALLBIG -Icudpp_mini --gencode arch=compute_20,code=compute_20 -D_DOUBLE_DOUBLE --cubin -DNV_KERNEL -o neighbor_gpu.cubin lal_neighbor_gpu.cu
nvcc fatal: Unknown option '-gencode'

Robert_Crovella · June 4, 2018, 6:00pm

No, I think that should work. The problem must be elsewhere. Perhaps your lammps build is not set up to use a device that old.

and the second problem is because I said:

–gencode arch=compute_20,code=compute_20

it should have been:

-gencode arch=compute_20,code=compute_20

mahmood.nt · June 4, 2018, 6:40pm

It seems that -gencode arch=compute_20,code=compute_20 is not compatible with –cubin which is also present in the command.
I think I have to first be sure that sm_20 is allowed in the version of lammps that I am using.

Robert_Crovella · June 4, 2018, 7:02pm

Thats right. A cubin is a binary executable object only for a specific device type; it does not and is not allowed to contain PTX. If you want to specify cc2.0 for cubin compilation use:

-gencode arch=compute_20,code=sm_20

which will generate cc2.0 SASS suitable for a cubin

However such an item will not contain PTX and won’t have the forward compatibility for that object. In that case you should set up your lammps compilation so that cubins are created for each GPU type you wish to run on.

mahmood.nt · June 4, 2018, 7:07pm

I was thinking about that… So, I can create two folders

lammps-tesla with CUDA_ARCH=-arch=sm_20

2)lamms-m2000 with CUDA_ARCH=-arch=sm_50

without using -gencode. Am I right? I will test that.