Nvidia forum thinks there’s links in my post (because in explaining the problem there are IP addresses to deal with (or some other parsing error!). So I converted the post to a shared google text file.
Hi do I need something similar in riva_deploy or only riva_build.
I’m getting lots of warnings in riva_deploy
[10/05/2022-19:00:45] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[10/05/2022-19:00:45] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[10/05/2022-19:00:45] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[10/05/2022-19:00:45] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
I was able to run riva_build.sh and riva_deploy.sh
with the new flag in riva_build --nn.use_trt_fp32
(See previous post for full script).
However, I’m now having a problem with riva_start.sh - not sure why?
(Script included here and hsa not changed from previous post.
Below is a link with the following:
riva_start.sh
directoy listing of /data
docker logs riva-speech
One question is where do I need to keep the .rmir file generated by riva_build?
To be sure I made a copy in /data and /data/models
@petra1 I saw that also - I don’t know where to get that from or how to make it. I did notice that I do have that file for the en-us conformer model (i.e. the “oob” conformer model).
Any idea if I can download something from the NGC catalog and pass an option in riva-build to make the file or if I need to find that file from somewhere in the NGC catalog? Any leads would be appreciated!
So that file (model.plan) should be created from the rmir file during the deploy. Could you cleanup your models directory, make sure that your rmir is inside the rmir directory and rerun riva_init.sh. The logs from the riva_init command could give some clues as to why this model.plan file is missing.
Also if you have logs from running your riva_build command, those would be handy too - to see if the rmir build worked as expected.
In my previous experiments, I did succeed in creating a model with just custom language model from the base acoustic model, similar to what you’re trying to do. One difference is though that I didn’t change the --language_code=en-GB.
I run init only for the out of the box (riva_quickstart) and that works fine for en-US.
Here I’m trying to change the language to be en-GB, so I first run riva-build to make the rmir file from some existing lm files and .far files.
After riva-build I run riva-deploy which makes the models directory. My rmir file (created by riva-build) is not in any subdirectory.
Does the rmir file need to be in it’s own directory?
Also I don’t run riva_init.sh or riva_start.sh from riva_quickstart, I run a riva_start.sh (included earlier in the post)
If there’s a way I can show you what I’m doing it might make more sense.
Does the rmir file need to be in it’s own directory?
When using riva_init.sh, rmir dir is where the script looks for the rmir files. But you’re using riva-deploy directly (which I haven’t tried) and I my guess is that a valid path to the rmir file should be enough.
Also I don’t run riva_init.sh or riva_start.sh from riva_quickstart, I run a riva_start.sh (included earlier in the post)
I see, is there any reason in particular for not using the quickstart scripts?
I’m flexible to try anything that works.
But the output of the riva-build (which I need to use the en-GB language model) is only a .rmir file.
At that point I run a one-liner riva-deploy (which is run inside servicemaker container) + a simplified riva-start (run on the host, as I have it).
I’m not sure if/how the quickstart riva_start.sh script should be modified (meaning do i run it instead of riva-deploy or after riva-deploy).
But the output of the riva-build (which I need to use the en-GB language model) is only a .rmir file.
do you mean that you didn’t see any logs? When I run the command, I can see the logs.
Also what logs can you see when running riva-deploy (after erasing the existing models dir to restart the deploy fully).
Hi,
Below is a link with the log of riva_build.sh (sh script shared previously)
and the log of riva_deploy.sh (sh script shared previously)
same error occurs when I run riva_start.sh (script shared previously)
output of docker logs riva-speech also shared previously.
Let me know what you think. I’m happy to run this interactively with you…
There’s still a bug somewhere. I noticed the following:
Riva deploy shows the following errors (I don’t know if/how/what I need to rebuild for the triton server to work properly - or how to check it).
[10/18/2022-07:23:31] [TRT] [E] 4: [network.cpp::validate::2787] Error Code 4: Internal Error (fp16 precision has been set for a layer or layer output, but fp16 is not config
ured in the builder)
[10/18/2022-07:23:31] [TRT] [E] 2: [builder.cpp::buildSerializedNetwork::609] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed. )
Riva deploy output also has some warnings of “One or more weights outside the range of INT32 was clamped”
docker logs riva-speech output says:
“Triton server died before reaching ready state. Terminating Riva startup.
Check Triton logs with: docker logs” - This is consistent with the riva-deploy error.
My question is how do I rebuild/restart/check the triton server is running correctly for riva-build/deploy to work properly?
FYI:
Below is a link to a shared document with the following:
There’s still a bug. Using the flag didn’t change anything in riva-build.
I’m thinking the problem is with the triton server (either not running or built properly) based on the error notice from
riva-deploy.
I had a similar issue as described in this thread: I fine tuned a custom model which returned empty transcripts when deployed, and the --nn.use_trt_fp32 fixed the issue. However, the fine tuned model with fp32 transcribes 50% slower than the base model (which I’m assuming was using fp16). Is this expected behavior? If so, do you have a timeline for when this issue will be fixed? Inference speed is critical for my application.