Slow startup using JIT'd PTX with libtorch

  1. You are forcing JIT.
  2. There are many libraries, with many kernels in them, linked to pytorch. All of that gets JIT-ed.
  3. The env var you set causes the JIT cache to be ignored, see here

so this all looks like expected behavior to me.

You might very well observe somewhat different behavior if you actually used ampere and actually did not set that env var. That env var is for test purposes, and the intended test is somewhat different than what you are testing here. The specific test is: “is this application JIT-able?”. The test you are trying to use it for is “how will a future GPU behave?” Those two tests are slightly different.