Unsupported GPU Architecture

I have been developing a GPU algorithm for a real-time control application. My code works properly however even if no changes have been made I will spontaneously get the error (while compiling the code):

/usr/local/cuda/bin/nvcc -c -gencode arch=compute_377201920,code=sm_377201920 n1rwmgpu_master.cu
nvcc fatal : Unsupported gpu architecture ‘compute_377201920’
make: *** [n1rwmgpu_master.o] Error 1

where n1rwmgpu_master.cu is the CUDA file I am attempting to compile. As I mentioned above I have run many code on this GPU so I do not understand why it complaining about the architecture. At first I was convinced that it was related to some sort of memory leak, but it now appears irregardless of anything I am doing with memory. I was then convinced it was because the GPU was over-heating but it has also occurred at GPU temperature as low as 43C.

I have a Tesla p40 GPU running CUDA 10. My driver version is 410.72 and I am boosting my clocks up 1531 MHz.

Thank you,
Alexander Battey

This isn’t a valid architecture:


example valid architectures for your P40 GPU would be


if you’re typing that line yourself (e.g. in your Makefile):

/usr/local/cuda/bin/nvcc -c -gencode arch=compute_377201920,code=sm_377201920 n1rwmgpu_master.cu

Then you are mixed up.

If that line is coming from a build system (e.g. a Makefile or Cmake) then your build system is mixed up. This doesn’t have anything to do with overheating, boosting or even whether you have a GPU or not. No compile issue would be dependent on any of those things.

You should track down in your makefile what is generating those particular switches

Thank you for the helpful response. My colleague wrote a code to pull the appropriate codes for the compiler. The simple function is:

#include <stdio.h>
int main()
cudaDeviceProp prop;
int v = prop.major * 10 + prop.minor;
printf("-gencode arch=compute_%d,code=sm_%d\n",v,v);

This print statement then writes to the Makefile to set the flags. Could you please help me figure out why this is spontaneously breaking about one out of every five times I attempt to compile? I would just hardcode the version you mentioned earlier but I am worried that this issue is being caused by a larger overlying issue.

Thank you,
Alexander Battey

I would start by doing proper CUDA error checking (google that) on the call to cudaGetDeviceProperties

If you don’t get a cudaSuccess return value, you should inspect why. Currently, any value above 75 or below 30 for the v variable is immediately and obviously wrong, so you could also debug that way, by printing some sort of error if that condition is detected.

I think this method is rather unusual, so I retract what I said about your compile process not depending on the installed GPU.

Thank you for all the help so far I have a follow-up. I added an error checking step for the above call to cudaGetDeviceProperties(). I then ran the code 7 times recompiling after each run (with no changes). It worked the first 6 times before returning a similar error to my first post on the 7th run. The error thrown indicated that this cudaGetDeviceProperties() failed due to and “initialization error”. Could you please help me understand how to avoid this issue and why it occurs sporadically and unpredictably.

Thank you,
Alexander Battey

Is anyone or anything else using the GPU when this is happening?

What compute mode is the GPU set to?

If this code is failing randomly, then probably just about any other CUDA application if run repeatedly will fail randomly in a similar fashion.


  1. Nothing is running the on the GPU when this happens?

  2. Below is the output to nvidia-smi currently. Therefore, I am in default mode.

| NVIDIA-SMI 410.72 Driver Version: 410.72 CUDA Version: 10.0 |
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| 0 Tesla P40 On | 00000000:84:00.0 Off | Off |
| N/A 37C P8 10W / 250W | 0MiB / 24451MiB | 0% Default |

| Processes: GPU Memory |
| GPU PID Type Process name Usage |
| No running processes found |

  1. And yes I switched programs to an earlier, much simpler, version of my code and it still gets this error after several runs.

Thank you,
Alexander Battey

So if you simply repeatedly run the cuda deviceQuery sample code from the command line, about every 7 times it fails with an error?

No deviceQuery does no result in an error. Launching the Kernel appears to be a crucial part of the error. Could you help me determine what about launching the kernel is resulting in this intermittent error?

Thank you,
Alexander Battey

This doesn’t seem to be productive. In this entire thread there’s been no mention of launching any kernels. The inspection code you posted does not launch any kernels. deviceQuery does not launch any kernels. And when I asked you if anything else is going on on the GPU when this is happening, you said “Nothing”

So I don’t know where kernels entered the picture.

If you want to make this method work, you’ll need to very clearly define the exact sequence of steps needed to make your inspection code fail. Without that, or unless I see that, I don’t think I can provide any assistance.

I apologize I thought I spelled out the problematic sequence of events in the original post, but I see now that it was not entirely clear. The sequence of events that leads to the error is as follows.

  1. I compile the .cu file where the flags are determined as discussed above.

  2. I launch a kernel which always successfully completes without any errors and produces the appropriate outputs.

  3. I repeat step 1 and 2 numerous times without issue until step 1 fails with the error shown in the original post of this thread.

There is nothing running during the compilation step as I mentioned earlier because as discussed in this post the kernel always successfully terminates. As mentioned above this compilation error is an initialization error, but I do not understand how this sort of error can occur.

I again apologize for the miscommunication and please let me know if you need any additional details.

Thank you,
Alexander Battey

For me to do anything, a complete test case would be needed.

All codes you are using. Instructions to compile those codes. A script which runs your sequence over and over until the error occurs.