Hardware - GPU:
T4
Hardware - CPU:
x86_64
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
4 CPU
Operating System:
Ubuntu
Linux 5.10.213-201.855.amzn2.x86_64
Riva Version:
$ ./bin/riva_server --version
riva_server version 2.15.0
Triton Version:
server_version | 2.40.0
There is Riva deployed in EKS, and we observe high memory usage during and after our load test…
the count of threads before the test run:
root@riva-api-en-primary-5d94766b8f-tczqs:/opt/riva# ps auxwwH | grep riva_server | wc -l
25
root@riva-api-en-primary-5d94766b8f-tczqs:/opt/riva# ps auxwwH | grep tritonserver | wc -l
70
mem:
PID TID MINFLT MAJFLT VSTEXT VSLIBS VDATA VSTACK LOCKSZ VSIZE RSIZE PSIZE VGROW RGROW SWAPSZ RUID EUID MEM CMD 1/1
21 - 334320 180 8.8M 1.8G 1.9G 364.0K 0.0K 41.5G 1.4G 0B 41.5G 1.4G 0B root root 9% tritonserver
95 - 4036 0 11.6M 12.7M 349.5M 172.0K 0.0K 388.2M 28.0M 0B 388.2M 28.0M 0B root root 0% riva_server
the count of threads after the test run:
root@riva-api-en-primary-6749f8678f-ng6zs:/opt/riva/bin# ps auxwwH | grep riva_server | wc -l
224
root@riva-api-en-primary-6749f8678f-ng6zs:/opt/riva/bin# ps auxwwH | grep tritonserver | wc -l
79
mem:
PID TID MINFLT MAJFLT VSTEXT VSLIBS VDATA VSTACK LOCKSZ VSIZE RSIZE PSIZE VGROW RGROW SWAPSZ RUID EUID MEM CMD 1/1
95 - 0 0 11.6M 12.7M 5.5G 172.0K 0.0K 5.5G 3.3G 0B 0B 0B 0B root root 21% riva_server
21 - 0 0 8.8M 1.8G 2.3G 364.0K 0.0K 41.9G 2.0G 0B 0B 0B 0B root root 13% tritonserver
the load test is:
# riva_streaming_asr_client --chunk_duration_ms=20 --simulate_realtime=true --automatic_punctuation=true --num_parallel_requests=160 --word_time_offsets=true --print_transcripts=false --interim_results=false --simulate_realtime=true --num_iterations=340 --audio_file=/sample.wav --output_filename=/tmp/output.json --riva_uri=10.18.12.209:50051
the test log:
I0424 16:06:28.589784 5120 riva_streaming_asr_client.cc:150] Using Insecure Server Credentials
Loading eval dataset...
filename: /sample.wav
Done loading 1 files
Not printing latency statistics because the client is run without the --simulate_realtime option and/or the number of requests sent is not equal to number of requests received. To get latency statistics, run with --simulate_realtime and set the --chunk_duration_ms to be the same as the server chunk duration
Run time: 516.921 sec.
Total audio processed: 58565.3 sec.
Throughput: 113.297 RTFX
after second test run:
# riva_streaming_asr_client --chunk_duration_ms=20 --simulate_realtime=true --automatic_punctuation=true --num_parallel_requests=160 --word_time_offsets=true --print_transcripts=false --interim_results=false --simulate_realtime=true --num_iterations=340 --audio_file=/sample.wav --output_filename=/tmp/output.json --riva_uri=10.18.12.209:50051
I0424 16:20:04.118997 5767 riva_streaming_asr_client.cc:150] Using Insecure Server Credentials
Loading eval dataset...
filename: /sample.wav
Done loading 1 files
Not printing latency statistics because the client is run without the --simulate_realtime option and/or the number of requests sent is not equal to number of requests received. To get latency statistics, run with --simulate_realtime and set the --chunk_duration_ms to be the same as the server chunk duration
Run time: 516.895 sec.
Total audio processed: 58565.3 sec.
Throughput: 113.302 RTFX
the count of threads:
root@riva-api-en-primary-6749f8678f-ng6zs:/opt/riva/bin# ps auxwwH | grep riva_server | wc -l
286
root@riva-api-en-primary-6749f8678f-ng6zs:/opt/riva/bin# ps auxwwH | grep tritonserver | wc -l
79
mem:
PID TID MINFLT MAJFLT VSTEXT VSLIBS VDATA VSTACK LOCKSZ VSIZE RSIZE PSIZE VGROW RGROW SWAPSZ RUID EUID MEM CMD 1/1
95 - 1703e3 0 11.6M 12.7M 9.3G 172.0K 0.0K 9.3G 6.4G 0B 9.3G 6.4G 0B root root 42% riva_server
21 - 617533 188 8.8M 1.8G 2.8G 364.0K 0.0K 42.4G 2.4G 0B 42.4G 2.4G 0B root root 16% tritonserver