Question on __CUDA_ARCH__ macro flag setting

I just completed compiling the lammps code for the Cuda toolkit 4.0 on an Ubuntu oneiric (11.10) OS after making some changes with the Makefiles, but still the code cannot find the Tesla C2070 gpus (4 in my Colfax CXT4000 system) when I attempt to run the lammps on them.

I am slowly trying to eliminate the unknowns on this build, and the first is the correct setting for the CUDA_ARCH macro flag.

I am currently using the value “sm_13” for that definition in the Makefiles, but looking at some of the Cuda code in the /usr/local/cuda area makes me wonder if it shouldn’t be defined as some decimal number > 200 ??

Does anyone know what alphanumeric string value for CUDA_ARCH should be for the Colfax CXT4000 system?

I did read the nvcc 1.1 manual, but it only went to “sm_11”

My goal is to find out why the lammps code cannot seem to find the GPUs when I ask the code to run on them with the “-cuda on” option.

(perhaps I should drop back and see if I can get some example code to compile and run on just one Tesla GPU?)

Thanks for your help.

  • Logical American

I subsequently found out from reading the lammps mailing list and checking back, that the compute capability for the Tesla C2050/C2070 gpus is 2.0 and was able to set CUDA_ARCH=sm_20 to the Makefile flags to successfully compile the gpu and cuda libraries, then the main lmp_openmpi executable.

The code still cannot seem to find the hardware, but I suspect this is because the drivers are not loaded. (lspci shows the hardware)