Different results of llm decoding using cuda 11.7 and cuda 11.8

For the llm decoding kernel function: FasterTransformer/src/fastertransformer/kernels/decoder_masked_multihead_attention/decoder_masked_multihead_attention_template.hpp at main · NVIDIA/FasterTransformer · GitHub

I use cuda 11.8 and nvcc 11.8 and got correct results. However, when I use cuda 11.7 and nvcc 11.7, the results is weird.

Anyone has some opinion about it? Really confused about it.

I think I find the cause. In the cuda 11.7 docker environment, when I pip install transformers, I noticed the warning message:

I guess the system environment of make command is corrupted after I pip install some python packages as I ran into this error message “make: /usr/local/lib/python3.8/dist-packages/cmake/data/bin/cmake: Command not found” occasionally. When I reinstall a clean virtual python environment, the results for cuda 11.7 is expected.

However, it is strange that the make command in the cuda 11.7 docker environment is usable for some days and suddenly becomes unusable these days.