I have 4 RTX 4060 Ti video cards, they are connected to one PCI Express Bridge. It is known about them that they do not support NVIDIA Direct P2P technology. I need to run the TensorRT-LLM Engine built using the library on them. After I built this engine with the command:
trtllm-build --checkpoint_dir /workspace/TensorRT-LLM/quantized-llama-3-70b-pp1-tp4-awq-w4a16-kvint8-gs64 --output_dir ./quantized-llama-3-70b --gemm_plugin auto
And I’m trying to run it with the command
mpirun -n 4 --allow-run-as-root python3 …/run.py --max_output_len=40 --tokenizer_dir ./llama70b_hf/models–meta-llama–Meta-Llama-3-70B-Instruct/ snapshots/7129260dd854a80eb10ace5f61c20324b472b31c/ --engine_dir quantized-llama-3-70b --input_text “In Bash, how do I list all text files?”
I use a ready-made checkpoint.
When I run this engine for execution I get an error
[TensorRT-LLM][WARNING] Device 0 peer access Device 1 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 2 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 3 is not available.
Traceback (most recent call last):
File “/workspace/TensorRT-LLM/TensorRT-LLM/examples/llama/…/run.py”, line 632, in
main(args)
File “/workspace/TensorRT-LLM/TensorRT-LLM/examples/llama/…/run.py”, line 478, in main
runner = runner_cls.from_dir(**runner_kwargs)
File “/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/model_runner_cpp.py”, line 222, in from_dir
executor = trtllm.Executor(engine_dir, trtllm.ModelType.DECODER_ONLY,
RuntimeError: [TensorRT-LLM][ERROR] CUDA runtime error in error: peer access is not supported between these two devices
It is obvious that the cards cannot communicate directly via the PCI Express bus.
How can I change the settings for building the engine or launching it, so that the cards interact through RAM.
Or maybe I need to rework the language model.
Hi @antonthai2022 ,
Can you try setting --use_custom_all_reduce=disable
when executing trtllm-build
and let us know if error still persist?
Hello! I builded with --use_custom_all_reduce=disable:
trtllm-build --checkpoint_dir /workspace/TensorRT-LLM/quantized-llama-3-70b-pp1-tp4-awq-w4a16-kvint8-gs64 --output_dir ./quantized-llama-3-70b --gemm_plugin auto --use_custom_all_reduce=disable
I got the same error:
root@pekarnya:/workspace/TensorRT-LLM/examples/llama# mpirun -n 4 --allow-run-as-root python3 …/run.py --max_output_len=40 --tokenizer_dir ./llama70b_hf/models–meta-llama–Meta-Llama-3-70B-Instruct/snapshots/7129260dd854a80eb10ace5f61c20324b472b31c/ --engine_dir quantized-llama-3-70b-all-reduce --input_text “In Bash, how do I list all text files?”
[TensorRT-LLM] TensorRT-LLM version: 0.11.0.dev2024052800
[TensorRT-LLM] TensorRT-LLM version: 0.11.0.dev2024052800
[TensorRT-LLM] TensorRT-LLM version: 0.11.0.dev2024052800
[TensorRT-LLM] TensorRT-LLM version: 0.11.0.dev2024052800
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Traceback (most recent call last):
File “/workspace/TensorRT-LLM/examples/llama/…/run.py”, line 632, in
Traceback (most recent call last):
File “/workspace/TensorRT-LLM/examples/llama/…/run.py”, line 632, in
main(args)
File “/workspace/TensorRT-LLM/examples/llama/…/run.py”, line 478, in main
main(args)
File “/workspace/TensorRT-LLM/examples/llama/…/run.py”, line 478, in main
runner = runner_cls.from_dir(**runner_kwargs)
TypeError: ModelRunnerCpp.from_dir() got an unexpected keyword argument ‘is_enc_dec’
runner = runner_cls.from_dir(**runner_kwargs)
TypeError: ModelRunnerCpp.from_dir() got an unexpected keyword argument ‘is_enc_dec’
Traceback (most recent call last):
File “/workspace/TensorRT-LLM/examples/llama/…/run.py”, line 632, in
Traceback (most recent call last):
File “/workspace/TensorRT-LLM/examples/llama/…/run.py”, line 632, in
main(args)
File “/workspace/TensorRT-LLM/examples/llama/…/run.py”, line 478, in main
main(args)
File “/workspace/TensorRT-LLM/examples/llama/…/run.py”, line 478, in main
runner = runner_cls.from_dir(**runner_kwargs)
TypeError: ModelRunnerCpp.from_dir() got an unexpected keyword argument ‘is_enc_dec’
runner = runner_cls.from_dir(**runner_kwargs)
TypeError: ModelRunnerCpp.from_dir() got an unexpected keyword argument ‘is_enc_dec’
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[5914,1],2]
Exit code: 1
And can you please explain the meaning of the --use_custom_all_reduce=disable option?
I started it, but it takes a very long time!
root@pekarnya:/workspace/TensorRT-LLM/examples/llama# mpirun -n 4 --allow-run-as-root python3 …/run.py --max_output_len=40 --tokenizer_dir ./llama70b_hf/models–meta-llama–Meta-Llama-3-70B-Instruct/snapshots/7129260dd854a80eb10ace5f61c20324b472b31c/ --engine_dir quantized-llama-3-70b-all-reduce --input_text “In Bash, how do I list all text files?”
[TensorRT-LLM] TensorRT-LLM version: 0.11.0.dev2024060400
[TensorRT-LLM] TensorRT-LLM version: 0.11.0.dev2024060400
[TensorRT-LLM] TensorRT-LLM version: 0.11.0.dev2024060400
[TensorRT-LLM] TensorRT-LLM version: 0.11.0.dev2024060400
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[TensorRT-LLM][WARNING] Device 3 peer access Device 0 is not available.
[TensorRT-LLM][WARNING] Device 3 peer access Device 1 is not available.
[TensorRT-LLM][WARNING] Device 3 peer access Device 2 is not available.
[TensorRT-LLM][WARNING] Device 3 peer access Device 4 is not available.
[TensorRT-LLM][WARNING] Device 3 peer access Device 5 is not available.
[TensorRT-LLM][WARNING] Device 3 peer access Device 6 is not available.
[TensorRT-LLM][WARNING] Device 2 peer access Device 0 is not available.
[TensorRT-LLM][WARNING] Device 2 peer access Device 1 is not available.
[TensorRT-LLM][WARNING] Device 2 peer access Device 3 is not available.
[TensorRT-LLM][WARNING] Device 2 peer access Device 4 is not available.
[TensorRT-LLM][WARNING] Device 2 peer access Device 5 is not available.
[TensorRT-LLM][WARNING] Device 2 peer access Device 6 is not available.
[TensorRT-LLM][WARNING] Device 1 peer access Device 0 is not available.
[TensorRT-LLM][WARNING] Device 1 peer access Device 2 is not available.
[TensorRT-LLM][WARNING] Device 1 peer access Device 3 is not available.
[TensorRT-LLM][WARNING] Device 1 peer access Device 4 is not available.
[TensorRT-LLM][WARNING] Device 1 peer access Device 5 is not available.
[TensorRT-LLM][WARNING] Device 1 peer access Device 6 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 1 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 2 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 3 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 4 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 5 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 6 is not available.
[TensorRT-LLM][WARNING] Device 2 peer access Device 0 is not available.
[TensorRT-LLM][WARNING] Device 2 peer access Device 1 is not available.
[TensorRT-LLM][WARNING] Device 2 peer access Device 3 is not available.
[TensorRT-LLM][WARNING] Device 2 peer access Device 4 is not available.
[TensorRT-LLM][WARNING] Device 2 peer access Device 5 is not available.
[TensorRT-LLM][WARNING] Device 2 peer access Device 6 is not available.
[TensorRT-LLM][WARNING] Device 1 peer access Device 0 is not available.
[TensorRT-LLM][WARNING] Device 1 peer access Device 2 is not available.
[TensorRT-LLM][WARNING] Device 1 peer access Device 3 is not available.
[TensorRT-LLM][WARNING] Device 1 peer access Device 4 is not available.
[TensorRT-LLM][WARNING] Device 1 peer access Device 5 is not available.
[TensorRT-LLM][WARNING] Device 1 peer access Device 6 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 1 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 2 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 3 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 4 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 5 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 6 is not available.
[TensorRT-LLM][WARNING] Device 3 peer access Device 0 is not available.
[TensorRT-LLM][WARNING] Device 3 peer access Device 1 is not available.
[TensorRT-LLM][WARNING] Device 3 peer access Device 2 is not available.
[TensorRT-LLM][WARNING] Device 3 peer access Device 4 is not available.
[TensorRT-LLM][WARNING] Device 3 peer access Device 5 is not available.
[TensorRT-LLM][WARNING] Device 3 peer access Device 6 is not available.
Input [Text 0]: “<|begin_of_text|>In Bash, how do I list all text files?”
Output [Text 0 Beam 0]: " I want to list all text files in a directory and its subdirectories. I tried the following command, but it didn’t work:
find. -type f -name "*.txt" |"