Provided PTX was compiled with an unsupported toolchain. on psgcluster

Hi all,

I’m trying to compile run some CUDA code on Linux (cuda 11.1.0, gcc 6.2.0), which otherwise compiles and runs fine on Windows (cuda 11.2, Nsight Visual Studio 2020.3.0.20315, driver 461.92, GTX1050) but now gives me a provided PTX was compiled with an unsupported toolchain error.

I previously faced and got a solution from this forum for this same error message on Windows by updating my driver. However, this time I am sshing into a cluster so I can’t simply just update the driver, nor have I been able to find out which driver is being used in the environment in the first place.

Most online hits (e.g. here and here) give solutions assuming a personal machine where sudo can be used, which I can’t do on a cluster. nvidia-smi yields command not found. modinfo nvidia yields Module nvidia not found..

I am new to Linux so I’m sorry if I’ve missed something obvious. For reference, the code compiles fine, it’s just running it that gives a problem.

Much appreciated.

Maybe the driver module has been renamed.

lsmod

should give you all the modules currently loaded.

lsmod | grep nvidia yields nothing.

What about just lsmod on it’s own?

That prints a bunch of information I have to say I don’t understand, namely:

squashfs               47827  0
can_bcm                21923  0
can_raw                17120  0
can                    36567  2 can_bcm,can_raw
vsock_diag             12610  0
vsock                  36452  1 vsock_diag
sctp_diag              12845  0
sctp                  270556  3 sctp_diag
libcrc32c              12644  1 sctp
udp_diag               12801  0
unix_diag              12601  0
tcp_diag               12591  0
inet_diag              18949  3 tcp_diag,sctp_diag,udp_diag
fuse                   91880  48
iptable_filter         12810  0
loop                   28072  0
overlay                71964  0
sep3_15               529195  0
pax                    13181  0
nfsv3                  43720  2
nfs                   261660  4 nfsv3
fscache                64984  1 nfs
rdma_ucm               26889  0
ib_ucm                 18506  0
rdma_cm                59578  1 rdma_ucm
iw_cm                  39418  1 rdma_cm
ib_ipoib              163919  0
ib_cm                  51641  3 rdma_cm,ib_ucm,ib_ipoib
ib_uverbs             103047  2 ib_ucm,rdma_ucm
ib_umad                22093  6
mlx5_fpga_tools        14392  0
mlx5_ib               262742  0
mlx5_core             820247  2 mlx5_ib,mlx5_fpga_tools
mlxfw                  18227  1 mlx5_core
mlx4_ib               211850  0
ib_core               283851  10 rdma_cm,ib_cm,iw_cm,mlx4_ib,mlx5_ib,ib_ucm,ib_umad,ib_uverbs,rdma_ucm,ib_ipoib
dm_mirror              22289  0
dm_region_hash         20813  1 dm_mirror
dm_log                 18411  2 dm_region_hash,dm_mirror
dm_mod                123941  2 dm_log,dm_mirror
sb_edac                32034  0
intel_powerclamp       14419  0
coretemp               13444  0
intel_rapl             19542  0
iosf_mbi               14990  1 intel_rapl
kvm_intel             174841  0
kvm                   578558  1 kvm_intel
irqbypass              13503  1 kvm
crc32_pclmul           13133  0
ghash_clmulni_intel    13273  0
aesni_intel           189415  1
lrw                    13286  1 aesni_intel
gf128mul               15139  1 lrw
glue_helper            13990  1 aesni_intel
ablk_helper            13597  1 aesni_intel
cryptd                 20511  3 ghash_clmulni_intel,aesni_intel,ablk_helper
zfs                  3559892  4
iTCO_wdt               13480  0
iTCO_vendor_support    13718  1 iTCO_wdt
zunicode              331170  1 zfs
zavl                   15236  1 zfs
icp                   270187  1 zfs
zcommon                73440  1 zfs
znvpair                89131  2 zfs,zcommon
spl                   102412  4 icp,zfs,zcommon,znvpair
pcspkr                 12718  0
joydev                 17389  0
pl2303                 19077  0
mei_me                 32848  0
ioatdma                67809  16
mei                    91099  1 mei_me
sg                     40721  0
lpc_ich                21086  0
i2c_i801               22550  0
ipmi_si                57587  1
shpchp                 37047  0
ipmi_devintf           17603  2
wmi                    19086  0
ipmi_msghandler        46608  2 ipmi_devintf,ipmi_si
sch_fq_codel           17571  98
binfmt_misc            17468  1
knem                   36921  0
nfsd                  347035  261
auth_rpcgss            59415  1 nfsd
nfs_acl                12837  2 nfsd,nfsv3
lockd                  93827  3 nfs,nfsd,nfsv3
grace                  13515  2 nfsd,lockd
sunrpc                353310  37 nfs,nfsd,auth_rpcgss,lockd,nfsv3,nfs_acl
ip_tables              27126  1 iptable_filter
ext4                  571716  4
mbcache                14958  1 ext4
jbd2                  103046  1 ext4
raid1                  39929  6
sd_mod                 46322  30
crc_t10dif             12912  1 sd_mod
crct10dif_generic      12647  0
mlx4_en               142833  0
mgag200                41138  1
drm_kms_helper        176920  1 mgag200
syscopyarea            12529  1 drm_kms_helper
sysfillrect            12701  1 drm_kms_helper
sysimgblt              12640  1 drm_kms_helper
fb_sys_fops            12703  1 drm_kms_helper
ttm                    99555  1 mgag200
isci                  137548  14
drm                   397988  4 ttm,drm_kms_helper,mgag200
ahci                   34056  6
ixgbe                 314916  0
libsas                 79000  1 isci
igb                   210385  0
libahci                31992  1 ahci
scsi_transport_sas     41224  2 isci,libsas
crct10dif_pclmul       14307  1
crct10dif_common       12595  3 crct10dif_pclmul,crct10dif_generic,crc_t10dif
mlx4_core             352500  2 mlx4_en,mlx4_ib
crc32c_intel           22094  1
libata                242992  3 ahci,libahci,libsas
i2c_algo_bit           13413  2 igb,mgag200
mdio                   13807  1 ixgbe
i2c_core               63151  6 drm,igb,i2c_i801,drm_kms_helper,mgag200,i2c_algo_bit
ptp                    19231  4 igb,ixgbe,mlx4_en,mlx5_core
pps_core               19057  1 ptp
mlx_compat             16882  15 rdma_cm,ib_cm,iw_cm,mlx4_en,mlx4_ib,mlx5_ib,mlx5_fpga_tools,ib_ucm,ib_core,ib_umad,ib_uverbs,mlx4_core,mlx5_core,rdma_ucm,ib_ipoib
dca                    15130  3 igb,ixgbe,ioatdma
devlink                42368  4 mlx4_en,mlx4_ib,mlx4_core,mlx5_core

Yes, my point being that perhaps the driver module has been renamed something else, although I have no idea why.

For what it’s worth, the rather old driver here is over 1MB in size and there are very few others over 1MB.

Sorry, I replied before the table appeared. I can’t see any obvious suspects there and I have no real experience with virtualised setups.

The only display related modules I can see there are “mgag200” and friends, which are of no interest and will be for the basic Matrox G200 display chip used by many server class motherboards.

Maybe the driver isn’t loaded?

The reason you’re having trouble with the commands like nvidia-smi is because you are working on the login node and there are no GPUs and therefore no GPU driver loaded on the login node.

If you want to find out what driver is in use on a compute node, spin up an interactive job in slurm, and then run nvidia-smi from there. Here is an example:

$ srun -N 1 -n 4 -p ivb_t4 --pty bash
$ nvidia-smi
Fri Mar 19 07:09:55 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02    Driver Version: 450.80.02    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:04:00.0 Off |                    0 |
| N/A   25C    P0    24W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla T4            On   | 00000000:05:00.0 Off |                    0 |
| N/A   26C    P0    24W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
$

And yes, if you don’t have administrator privilege on a cluster (which is not surprising) then you won’t be able to change the driver. The alternative in this case would be to downgrade the version of CUDA you are using. This can often be done on a compute cluster using proper choices for the module system. You can find out what CUDA versions are provided using the

module avail

command, and there should be command line help available so you can learn about the module system

module --help

In the above example, I would want to make sure that I was using a CUDA version that was consistent with driver 450.80.02. Those would be CUDA versions up through 11.0 but not 11.1 or 11.2, see table 2 here for the decoder.

Thank you, your advice helped me solve this problem easily.

Hi, do you have any advice on how I could debug my code on Linux? Namely, I’m getting an illegal memory access violation on Linux that I do not get on Windows; the outputs between Linux and Windows match exactly until this violation however. Normally I would resort to something like compute-sanitizer but this is (expectedly) unavailable on the cluster.

I’m not sure if debugging on a cluster is a good idea, but I don’t have a Linux machine to hand to test otherwise. My laptop is an Asus X560 whose builtin WiFi driver is not compatible with e.g. Ubuntu.

EDIT: Sorry, I realised cuda-memcheck is available on the cluster even if compute-sanitizer isn’t. I will give this a shot. Any additional advice would still be appreciated though.