Crashes on First Inference After Loading in Latest vLLM

Hello,
I’m using vLLM on a dual-node DGX Spark setup. vLLM worked very well up to version 0.14.0, but starting from version 0.15.0, the model either crashes on the very first inference after loading, or it hangs without producing any output on the first inference.

When building the Docker image for vLLM 0.15.0, I’m using PyTorch 2.10.0. Is it possible that vLLM does not support PyTorch 2.10.0 yet?

I made a bug report here:

It’s something with vLLM or pytorch.

Have not had a chance to fully debug but I do provide a temp workaround.

Will get to it soon.

I replied in the ticket, but have you used a wheels build by the chance?
I noticed all kinds of weird issues after the latest pytorch 2.10 migration. Should work fine if compiled from source (without --use-wheels flag).

I’m going to disable wheels builds for now.

Yep, works!

Thanks again!