Riva 1.8.0b0 riva-build speech_recognition --nn.trt_max_workspace_size does not actually set workspace size

pineapple9011 · December 20, 2021, 1:58am

Hardware - GPU (T4)
Operating System: Amazon Linux 2
Riva Version: 1.8.0b0

When building a speech_recognition model ex.

riva-build speech_recognition ...

I get the following error (remove duplicate logs for simplicity)

2021-12-20 01:51:29,391 [INFO] Building TRT engine from ONNX file
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:604] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 568860638
[TensorRT] WARNING: onnx2trt_utils.cpp:362: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[TensorRT] WARNING: Detected invalid timing cache, setup a local cache instead
[TensorRT] WARNING: Internal error: cannot reformat, disabling format. Try decreasing the workspace size with IBuilderConfig::setMaxWorkspaceSize().
[TensorRT] WARNING: Memory requirements of format conversion cannot be satisfied during timing, format rejected.
...
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] WARNING: -------------- The current system memory allocations dump as below --------------
[0x558c491f11c0]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 6190 time: 8.41e-07
[0x558c491f0330]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 6174 time: 8.18e-07
[0x558c491f0190]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 6171 time: 8.42e-07
[0x558c491efff0]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 6168 time: 8.61e-07
[0x558c491efc10]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 6156 time: 1.013e-06
...
[0x558c307cda20]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 264 time: 7.88e-07
[0x558bfeff7920]:4 :: weight scales in internalAllocate: at runtime/common/weightsPtr.cpp: 100 idx: 219 time: 1.377e-06
[0x558c307cdbc0]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 267 time: 7.88e-07
-------------- The current device memory allocations dump as below --------------
[0]:34359738368 :HybridGlobWriter in reserveRegion: at optimizer/common/globWriter.cpp: 245 idx: 1 time: 0.000138994
[0x302000000]:67108864 :HybridGlobWriter in reserveRegion: at optimizer/common/globWriter.cpp: 245 idx: 0 time: 0.000211905
[TensorRT] ERROR: Requested amount of GPU memory (34359738368 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[TensorRT] WARNING: Skipping tactic 0 due to oom error on requested size of 34359738368 detected for tactic 24.
Try decreasing the workspace size with IBuilderConfig::setMaxWorkspaceSize().
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] WARNING: -------------- The current system memory allocations dump as below --------------
[0x558c491f11c0]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 6190 time: 8.41e-07
[0x558c491f0330]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 6174 time: 8.18e-07
[0x558c491f0190]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 6171 time: 8.42e-07
[0x558c491efff0]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 166 idx: 6168 time: 8.61e-07
...

I tried setting the --nn.trt_max_workspace_size=15000000000 (~13GB) but I still get the same exact error showing the “requested size of 34359738368”.

pineapple9011 · December 20, 2021, 2:18am

My current workaround is that I set chunk_size to <=90 unlike the recommendation by the official documentation which is 900.

ShantanuNair · December 22, 2021, 6:45am

I’m running into the same issue. Quickstart scripts used to work and run very well on a g4dn.xlarge 16GiB on AWS. Now I get this during riva_init.sh. This did not happen with versions < 1.8beta.

I also have all TTS models commented out in config.sh

rleary · December 22, 2021, 3:17pm

Can you please confirm whether or not anything else was running on the GPU at the time you ran this command? Is this a g4dn.xlarge node @pineapple9011? We’d like to try to replicate this one.

The workspace size is set on a per tactic/layer level, so you would actually want to try reducing it.

pineapple9011 · December 22, 2021, 9:53pm

I can confirm that we run a g4dn.xlarge and nothing else is running on the CPU/GPU (except for the AWS ECS agent which shouldn’t be an issue).
nvidia-smi only shows the riva-build/deploy processes using the GPU.

I am currently using the following AMI: amzn2-ami-ecs-gpu-hvm-2.0.20211020-x86_64-ebs (ID: ami-09fadf5d41025c619). I believe this is usually the suggested ECS GPU AMI by AWS.

Reducing the trt_max_workspace_size didn’t help, only reducing the chunk_size.