Parallelization on RTX 4060 Ti cards

antonthai2022 · June 17, 2024, 11:24am

I have 4 RTX 4060 Ti video cards, they are connected to one PCI Express Bridge. It is known about them that they do not support NVIDIA Direct P2P technology. I need to run the TensorRT-LLM Engine built using the library on them. After I built this engine with the command:

trtllm-build --checkpoint_dir /workspace/TensorRT-LLM/quantized-llama-3-70b-pp1-tp4-awq-w4a16-kvint8-gs64 --output_dir ./quantized-llama-3-70b --gemm_plugin auto

And I’m trying to run it with the command

mpirun -n 4 --allow-run-as-root python3 …/run.py --max_output_len=40 --tokenizer_dir ./llama70b_hf/models–meta-llama–Meta-Llama-3-70B-Instruct/ snapshots/7129260dd854a80eb10ace5f61c20324b472b31c/ --engine_dir quantized-llama-3-70b --input_text “In Bash, how do I list all text files?”

I use a ready-made checkpoint.

When I run this engine for execution I get an error

[TensorRT-LLM][WARNING] Device 0 peer access Device 1 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 2 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 3 is not available.

Traceback (most recent call last):
File “/workspace/TensorRT-LLM/TensorRT-LLM/examples/llama/…/run.py”, line 632, in
main(args)
File “/workspace/TensorRT-LLM/TensorRT-LLM/examples/llama/…/run.py”, line 478, in main
runner = runner_cls.from_dir(**runner_kwargs)
File “/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/model_runner_cpp.py”, line 222, in from_dir
executor = trtllm.Executor(engine_dir, trtllm.ModelType.DECODER_ONLY,
RuntimeError: [TensorRT-LLM][ERROR] CUDA runtime error in error: peer access is not supported between these two devices

It is obvious that the cards cannot communicate directly via the PCI Express bus.

How can I change the settings for building the engine or launching it, so that the cards interact through RAM.
Or maybe I need to rework the language model.

AakankshaS · June 25, 2024, 10:04am

Hi @antonthai2022 ,
Can you try setting --use_custom_all_reduce=disable when executing trtllm-build and let us know if error still persist?

antonthai2022 · June 26, 2024, 4:28am

Hello! I builded with --use_custom_all_reduce=disable:
trtllm-build --checkpoint_dir /workspace/TensorRT-LLM/quantized-llama-3-70b-pp1-tp4-awq-w4a16-kvint8-gs64 --output_dir ./quantized-llama-3-70b --gemm_plugin auto --use_custom_all_reduce=disable

I got the same error:
root@pekarnya:/workspace/TensorRT-LLM/examples/llama# mpirun -n 4 --allow-run-as-root python3 …/run.py --max_output_len=40 --tokenizer_dir ./llama70b_hf/models–meta-llama–Meta-Llama-3-70B-Instruct/snapshots/7129260dd854a80eb10ace5f61c20324b472b31c/ --engine_dir quantized-llama-3-70b-all-reduce --input_text “In Bash, how do I list all text files?”
[TensorRT-LLM] TensorRT-LLM version: 0.11.0.dev2024052800
[TensorRT-LLM] TensorRT-LLM version: 0.11.0.dev2024052800
[TensorRT-LLM] TensorRT-LLM version: 0.11.0.dev2024052800
[TensorRT-LLM] TensorRT-LLM version: 0.11.0.dev2024052800
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Traceback (most recent call last):
File “/workspace/TensorRT-LLM/examples/llama/…/run.py”, line 632, in
Traceback (most recent call last):
File “/workspace/TensorRT-LLM/examples/llama/…/run.py”, line 632, in
main(args)
File “/workspace/TensorRT-LLM/examples/llama/…/run.py”, line 478, in main
main(args)
File “/workspace/TensorRT-LLM/examples/llama/…/run.py”, line 478, in main
runner = runner_cls.from_dir(runner_kwargs)
TypeError: ModelRunnerCpp.from_dir() got an unexpected keyword argument ‘is_enc_dec’
runner = runner_cls.from_dir(runner_kwargs)
TypeError: ModelRunnerCpp.from_dir() got an unexpected keyword argument ‘is_enc_dec’
Traceback (most recent call last):
File “/workspace/TensorRT-LLM/examples/llama/…/run.py”, line 632, in
Traceback (most recent call last):
File “/workspace/TensorRT-LLM/examples/llama/…/run.py”, line 632, in
main(args)
File “/workspace/TensorRT-LLM/examples/llama/…/run.py”, line 478, in main
main(args)
File “/workspace/TensorRT-LLM/examples/llama/…/run.py”, line 478, in main
runner = runner_cls.from_dir(runner_kwargs)
TypeError: ModelRunnerCpp.from_dir() got an unexpected keyword argument ‘is_enc_dec’
runner = runner_cls.from_dir(runner_kwargs)
TypeError: ModelRunnerCpp.from_dir() got an unexpected keyword argument ‘is_enc_dec’

Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.

mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

Process name: [[5914,1],2]
Exit code: 1

antonthai2022 · June 26, 2024, 4:47am

And can you please explain the meaning of the --use_custom_all_reduce=disable option?

antonthai2022 · June 26, 2024, 7:00am

I started it, but it takes a very long time!

root@pekarnya:/workspace/TensorRT-LLM/examples/llama# mpirun -n 4 --allow-run-as-root python3 …/run.py --max_output_len=40 --tokenizer_dir ./llama70b_hf/models–meta-llama–Meta-Llama-3-70B-Instruct/snapshots/7129260dd854a80eb10ace5f61c20324b472b31c/ --engine_dir quantized-llama-3-70b-all-reduce --input_text “In Bash, how do I list all text files?”
[TensorRT-LLM] TensorRT-LLM version: 0.11.0.dev2024060400
[TensorRT-LLM] TensorRT-LLM version: 0.11.0.dev2024060400
[TensorRT-LLM] TensorRT-LLM version: 0.11.0.dev2024060400
[TensorRT-LLM] TensorRT-LLM version: 0.11.0.dev2024060400
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[TensorRT-LLM][WARNING] Device 3 peer access Device 0 is not available.
[TensorRT-LLM][WARNING] Device 3 peer access Device 1 is not available.
[TensorRT-LLM][WARNING] Device 3 peer access Device 2 is not available.
[TensorRT-LLM][WARNING] Device 3 peer access Device 4 is not available.
[TensorRT-LLM][WARNING] Device 3 peer access Device 5 is not available.
[TensorRT-LLM][WARNING] Device 3 peer access Device 6 is not available.
[TensorRT-LLM][WARNING] Device 2 peer access Device 0 is not available.
[TensorRT-LLM][WARNING] Device 2 peer access Device 1 is not available.
[TensorRT-LLM][WARNING] Device 2 peer access Device 3 is not available.
[TensorRT-LLM][WARNING] Device 2 peer access Device 4 is not available.
[TensorRT-LLM][WARNING] Device 2 peer access Device 5 is not available.
[TensorRT-LLM][WARNING] Device 2 peer access Device 6 is not available.
[TensorRT-LLM][WARNING] Device 1 peer access Device 0 is not available.
[TensorRT-LLM][WARNING] Device 1 peer access Device 2 is not available.
[TensorRT-LLM][WARNING] Device 1 peer access Device 3 is not available.
[TensorRT-LLM][WARNING] Device 1 peer access Device 4 is not available.
[TensorRT-LLM][WARNING] Device 1 peer access Device 5 is not available.
[TensorRT-LLM][WARNING] Device 1 peer access Device 6 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 1 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 2 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 3 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 4 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 5 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 6 is not available.
[TensorRT-LLM][WARNING] Device 2 peer access Device 0 is not available.
[TensorRT-LLM][WARNING] Device 2 peer access Device 1 is not available.
[TensorRT-LLM][WARNING] Device 2 peer access Device 3 is not available.
[TensorRT-LLM][WARNING] Device 2 peer access Device 4 is not available.
[TensorRT-LLM][WARNING] Device 2 peer access Device 5 is not available.
[TensorRT-LLM][WARNING] Device 2 peer access Device 6 is not available.
[TensorRT-LLM][WARNING] Device 1 peer access Device 0 is not available.
[TensorRT-LLM][WARNING] Device 1 peer access Device 2 is not available.
[TensorRT-LLM][WARNING] Device 1 peer access Device 3 is not available.
[TensorRT-LLM][WARNING] Device 1 peer access Device 4 is not available.
[TensorRT-LLM][WARNING] Device 1 peer access Device 5 is not available.
[TensorRT-LLM][WARNING] Device 1 peer access Device 6 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 1 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 2 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 3 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 4 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 5 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 6 is not available.
[TensorRT-LLM][WARNING] Device 3 peer access Device 0 is not available.
[TensorRT-LLM][WARNING] Device 3 peer access Device 1 is not available.
[TensorRT-LLM][WARNING] Device 3 peer access Device 2 is not available.
[TensorRT-LLM][WARNING] Device 3 peer access Device 4 is not available.
[TensorRT-LLM][WARNING] Device 3 peer access Device 5 is not available.
[TensorRT-LLM][WARNING] Device 3 peer access Device 6 is not available.
Input [Text 0]: “<|begin_of_text|>In Bash, how do I list all text files?”
Output [Text 0 Beam 0]: " I want to list all text files in a directory and its subdirectories. I tried the following command, but it didn’t work:

find. -type f -name "*.txt" |"

Topic		Replies	Views
TensorRT Integration Speeds Up TensorFlow Inference Technical Blog	40	801	March 27, 2020
Tf-trt conversion got killed TensorRT tensorrt , tensorflow , jetson-inference	3	747	April 22, 2021
Turbocharging Meta Llama 3 Performance with NVIDIA TensorRT-LLM and NVIDIA Triton Inference Server Technical Blog	62	3564	August 28, 2024
TensorRT-LLM Speculative Decoding Boosts Inference Throughput by up to 3.6x Technical Blog	4	79	January 9, 2025
Memory Issues and Conversion issues with TF-TRT on Nano Jetson Nano tensorrt	8	1532	October 18, 2021
TensorRT 10.8 on Windows: API Usage Error (Target GPU SM 120 is not supported by this TensorRT release.) TensorRT cudnn	3	314	March 27, 2025
Error Code 1: Serialization (Serialization assertion safeVersionRead== kSAFE_SERIALIZATION_VERSION failed.Version tag does not match. Note: Current Ve TensorRT llama	1	46	February 28, 2025
Failure of install tensorrt TensorRT	1	1081	June 29, 2024
Calibration failed: INTERNAL: Failed to build TensorRT engine (INT8 precision mode) in Jetson Xavier NX (16GB) Jetson Xavier NX tensorrt	9	751	April 12, 2023
Peer access not supported between devices CUDA Programming and Performance	11	7169	November 9, 2017

Parallelization on RTX 4060 Ti cards

Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted.

Related topics

Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.